Friday, 2020-10-16

ianwok, it's been about 15 minutes and i can't see it recovering.  should we restart the container?00:00
fungiback and catching up00:03
fungilooks like the ssh api is also unresponsive?00:03
clarkbfungi: we turned off apache00:04
ianwok, i can run "jstack" in the container a get thread backtraces00:04
clarkbhoping it would settle down00:04
ianw91073 / 91074 are two top threads00:04
fungiturning off apache wouldn't render the ssh api unresponsive too00:05
clarkbya that was what I realized afterwards00:05
clarkbI suggested we wait until 00:00 UTC then restart00:05
clarkband we are now past00:05
clarkbits not any happier either00:06
ianwyou need to convert the thread id to hex?00:06
clarkb(but if ianw is thread dumping we can wait on that)00:06
fungilooks like there was a spike in data being pulled from the database, and cpu/ram immediately shot up00:06
clarkbwe run db backups around 00:00 fwiw00:07
clarkbso need to keep that in mind00:07
ianwso nid in hex is the thread id00:07
fungithe spike for eth1 was earlier than midnight tho00:08
clarkbfungi: k00:08
clarkbin that case a bad query possibly?00:08
clarkbor bad queries00:08
clarkbI think melody would show us the query fwiw00:08
clarkbianw's thread dump might too if it is that00:08
fungilooks like the db utilization started around 22:55 and topped out 15 minutes later00:09
ianwok, top -H in the container gives the thread id's00:09
ianwso 0xE is pegged at 100%00:09
fungiyeah, and it looks like the jvm is maxxed out on its allowed memory00:11
ianw"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007ff70c028000 nid=0xe runnable00:11
clarkbah00:12
ianwhttp://paste.openstack.org/show/799105/00:12
ianwbasically, i would say all the threads pegged at 100% are the GC threads00:12
clarkbso theory time: some event happens that causes us to need many memories00:13
clarkbthen we fall over bceause we can't GC fast enough00:13
clarkbwe have had GC problems in the past00:13
clarkbI don't think we'll learn anything new from the thread dump unless we want to dump all threads and hope for query type info maybe?00:14
clarkbianw: are you able to get memory allocation per thread somehow ? (that requires crawling pointers to the heap I'm betting)00:14
clarkblikely that it won't just do that00:14
ianwno; yeah all the threads stuck at 100% are the GC threads00:14
clarkbI guess we restart then?00:14
fungii think so, yeah00:15
ianwerrit@review01:/$ jmap 700:15
ianwAttaching to process ID 7, please wait...00:15
ianwERROR: ptrace(PTRACE_ATTACH, ..) failed for 7: Operation not permitted00:15
ianwError attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 7: Operation not permitted00:15
ianwin theory that could give us heap knowledge etc, but ... i don't know00:16
ianwpermissions00:16
ianwso, docker-compose down & up ?00:16
clarkbianw: googling says your jmap thing needs to run as the same user as the process that is running00:17
fungidown, up -d00:17
clarkblooks like you were gerrit but is the user gerrit2?00:17
clarkbanyway ya its down then up -d00:17
ianwhrm, i was in the container00:17
clarkboh in the container we call it gerrit got it00:17
clarkbthen ya I don't have other ideas00:17
ianwrestarting now00:18
ianwapache back on now too00:19
clarkberror_log doesn't show it is up yet fwiw00:20
*** DSpider has quit IRC00:20
clarkband now it says it is ready. Usually takes apache a minute to catch up00:21
ianw2020-10-16 00:20:45,068] [main] INFO  com.google.gerrit.server.config.ScheduleConfig : gc schedule parameter "gc.interval" is not configured00:21
ianwdunno if that means anything00:21
ianwoh that's git collection anyway00:23
clarkbmelody data doesn't seem to surive a restart iirc it never has but that may be a good thing to try and fix00:23
clarkbfungi: was the previous restart a similar situation? java spinning the cpu and not doing much else?00:24
fungiwhen was the previous restart? i need to jog my memory now00:24
clarkbthe 13th says logs I think00:25
clarkbjust a few days ago. I woke up one day and it had been restarted due to lack of responsiveness00:25
clarkbhttps://review.opendev.org/#/c/628296/ spams the logs with mergability check errors00:26
fungihttp://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-10-13.log.html#t2020-10-13T12:10:2400:26
clarkbmaybe we should abandon that change just to clean up our error logs?00:27
fungii saw lots of "org.eclipse.jetty.io.SelectorManager : Could not process key for channel java.nio.channels.SocketChannel[connected local=/127.0.0.1:8081 remote=/127.0.0.1:47906]"00:28
fungiin the error log00:28
fungithis week's been so crazy i forgot we had a gerrit outage 3 days ago00:30
clarkbwe have had memory/gc issues in the past which I believe we solved by upgrading gerrit00:33
clarkbold gerrit leaked00:33
clarkbbnemec seems to do some chatty ssh queries too00:34
clarkbthey time out after 5 seconds though00:35
fungican someone with a little more insight into this outage #status log the restart so we don't forget the circumstances?00:43
clarkbhow about #status log Gerrit was restarted after it ran out of memory and spent all of its CPU cycles trying to garbage collect.00:44
clarkbianw: fungi ^ does that look accurate?00:44
ianw#status log restarted gerrit container.  cpu pegged and jstack of the busy threads in the container showed all were gc related00:45
openstackstatusianw: finished logging00:45
ianwoh oops i meant to suggest that ... but, well there we go00:45
clarkbthe details seem to overlap00:46
fungiwfm, thx00:46
corvusfungi: i found this article very amusing, and i note that your nick seems to have broken my ability to read certain phrases without cognitive dissonance: https://www.npr.org/2020/10/15/923411578/a-disturbing-twinkie-that-has-so-far-defied-science00:50
corvus"Eventually, all of us are food for fungi."00:50
clarkbI'm going to go figure out dinner now00:50
clarkbif we notice this again I think we should try and identify what the server is doing via melody if possible00:50
clarkbif you click on the + details buttons for threads it gives you a lot of thread info00:51
clarkbfrom that you may be able to infer what is busy and the source of the problem00:51
fungicorvus: i appreciate the effect my nickname choice inflicts on others00:52
clarkbjust don't click on the red stop sign when you do that as it will kill the threads :)00:52
clarkboh also the dump threads as text gives you slightly different perspective too00:52
fungicorvus: i have even more trouble reading that one because i have a long-time friend whose nick is "twinkie"00:52
clarkbthe text dump has tracebacks for each thread00:52
corvusclarkb: yeah, i think you can get all the thread dumps in one page00:52
clarkbbut not the timing info so yo ucan kind of correlate between the two things00:53
clarkbcorvus: yup00:53
corvussaving that would be ++00:53
ianwclarkb: you get pretty good info with jstack in the container00:56
clarkbianw: what does the invocation for that look like?00:56
ianwexec -it /bin/bash into the container, then you can ps in there to get the java process in the container, then just jstack <pid>00:57
ianwi think it's actually probably the same info the melody page has00:57
ianwbut yeah, with the spinning threads all just being the ~10 GC threads ... hard to say what drove it mad :/00:58
ianwi feel like i've done that before, pre-container days, and seen similar.  we had a period of gerrit instability for a while where we restarted pretty frequently00:58
clarkbyup, I think that was on an older version that had leaks00:58
clarkbthen when we got to 2.13 it was fine00:59
clarkbmaybe we've managed to tickle a new leak or similar problem in 2.1300:59
ianwi think, given plans, extensive investigation isn't worth it :)01:00
ianwgrabbing some lunch in the sun, bib01:01
clarkblooking at cacti we seem to have high memory use a few times a year01:01
clarkbooh sun01:01
clarkbour current memory use is above the previous baseline but not terrible01:01
clarkband ya I said I'd do dinner then didn't I'm really popping out now01:01
*** ysandeep is now known as ysandeep|afk01:28
*** hamalq has quit IRC02:54
*** hamalq has joined #opendev03:08
ianwhrm, i think openstackgerrit has disappeared again03:54
*** openstackgerrit has quit IRC03:57
*** marios has joined #opendev05:14
*** openstackgerrit has joined #opendev05:34
openstackgerritIan Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible  https://review.opendev.org/75766005:34
ianwfungi: ^ that is pretty much ready to go, modulo testinfra bits.  so if you have any comments on the core of it feel free to make them05:36
*** tkajinam_ has joined #opendev05:46
*** tkajinam has quit IRC05:49
*** ysandeep|afk is now known as ysandeep06:12
*** roman_g has joined #opendev06:18
*** lpetrut has joined #opendev06:20
*** hamalq has quit IRC06:20
*** ykarel|away has joined #opendev06:26
*** ykarel|away has quit IRC06:27
*** tkajinam_ has quit IRC06:56
*** tkajinam has joined #opendev06:57
*** hashar has joined #opendev07:13
*** andrewbonney has joined #opendev07:14
*** slaweq has joined #opendev07:29
*** ysandeep is now known as ysandeep|lunch07:35
*** slaweq has quit IRC07:35
*** hamalq has joined #opendev07:37
*** hamalq has quit IRC07:42
*** slaweq has joined #opendev07:58
*** Tengu has quit IRC08:00
*** mkalcok has joined #opendev08:02
*** ysandeep|lunch is now known as ysandeep08:09
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: ensure-packer: ensure unzip in role to make ensure-packer self contained  https://review.opendev.org/75853508:15
*** hamalq has joined #opendev08:15
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: ensure-packer: ensure unzip in role to make ensure-packer self contained  https://review.opendev.org/75853508:16
*** slaweq has quit IRC08:17
*** hamalq has quit IRC08:20
*** Tengu has joined #opendev08:22
*** Tengu has quit IRC08:36
*** Tengu has joined #opendev08:43
*** ysandeep is now known as ysandeep|afk08:45
*** hashar has quit IRC08:58
*** tosky has joined #opendev08:59
*** slaweq has joined #opendev08:59
*** hashar has joined #opendev09:01
*** tkajinam has quit IRC09:36
*** slaweq has quit IRC09:38
*** ysandeep|afk is now known as ysandeep10:03
*** hamalq has joined #opendev10:16
*** hamalq has quit IRC10:21
*** chandankumar has quit IRC10:25
*** Tengu has quit IRC10:41
*** Tengu has joined #opendev10:53
*** DSpider has joined #opendev11:05
*** hashar has quit IRC11:14
*** slaweq has joined #opendev11:31
*** slaweq has quit IRC11:39
*** hashar has joined #opendev12:01
*** hamalq has joined #opendev12:17
*** hamalq has quit IRC12:22
*** hamalq has joined #opendev12:53
*** fressi has joined #opendev12:56
*** hamalq has quit IRC12:58
*** hashar has quit IRC13:13
*** priteau has joined #opendev13:39
fungigrowing up in a mountain forest, it's funny i never realized how much i took the abundance of rocks and dirt for granted until i relocated to a sandbar. making a quick trip to the garden center to buy some, back shortly13:52
*** slittle11 has joined #opendev13:53
slittle11Please add me as the first core to the new group starlingx-snmp-armada-app-core.  I can populate the rest of the members.13:55
*** slittle11 is now known as slittle113:55
*** fressi has quit IRC14:02
fungi#status log added slittle11 as initial member of starlingx-snmp-armada-app-core group in gerrit14:05
openstackstatusfungi: finished logging14:05
*** slaweq has joined #opendev14:06
slittle1thank you14:07
fungiyou're welcome!14:07
*** hamalq has joined #opendev14:19
*** ysandeep is now known as ysandeep|away14:21
*** hamalq has quit IRC14:24
zigoHi there!14:33
zigofungi: How can I get the CI try to run with amqp 5.0.1 ?14:33
zigoWe're in a bit of a dependency hell in Debian Sid, where I updated kombu, but that broke celery, which now needs to be updated, but then it require vine 5, which in turns needs amqp 5...14:34
zigoSo, I'd like to know if Victoria can be used with AMQP 5 too...14:34
clarkbzigo that should be controlled by constraints14:36
zigoclarkb: Sure, but in what repo?14:36
clarkbrequirements14:36
zigoclarkb: If I push a change in the global-reqs, will it run some tests?14:36
clarkbyes and you can use depends on14:37
clarkblooks like the amqp constraint may be limited by something else as it is 2.6.1 so there may be abit of untangling14:39
zigoclarkb: Thanks.14:41
*** slaweq has quit IRC14:46
*** mkalcok has quit IRC14:52
fungiif you run the constraints generation build logs, it should say why 2.6.1 is getting chosen14:55
fungier, rather look at the constraints generation build logs14:55
fungiactually it seems like the propose-updates periodic job for openstack/requirements doesn't log the generate-constraints output (or we don't collect where it's logged to)15:01
clarkbdoes it go to the console log?15:01
funginot that i can tell15:02
fungithough this is also worrisome: https://zuul.opendev.org/t/openstack/build/b156209eb3a845728ab26cd061dd7a40/log/job-output.txt#75615:02
clarkbaha 2.6.1 is only from july and 5.0 is the next release from last week15:02
clarkbprobably still something has a cap but we aren't super behind15:03
clarkbfungi: my favorite part is it is already in the requirements repo :)15:04
fungizigo: looks like it's already proposed in https://review.opendev.org/750084 but openstack has been under a requirements freeze leading up to the victoria release two days ago15:07
zigofungi: That's fine, I just need to know if it would work or just break everything in Debian...15:08
fungizigo: reviewing the job results on that change would likely be a good start, though it's updating a lot of stuff coming out of the freeze period so the failures may not be related to new amqp15:10
fungizigo: also there's a #openstack-requirements channel where this might be more on-topic15:10
clarkbanother thing to do would be to check why they changed version number from 2.6.1 to 515:11
zigoclarkb: They removed a bunch of Py2 compat code in amqp ...15:11
clarkbdropped python2 and python3.5 and older support. stopped using ssl.wrap_socket15:12
zigoAll of this is Celery stuff, they moved all to version 5: vine, celery, amqp ...15:12
fungiwhich is likely fine for openstack wallaby, we're not running any py27 jobs on requirements changes since a cycle already15:12
fungior py35 for that matter15:12
zigoThey pretend that the modules are independent, it's just not the reality.15:12
zigoYeah, Py 3.5 is old ...15:13
zigoThat's Ubuntu 16.04 / Debian Jessie ...15:13
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site  https://review.opendev.org/75849515:19
openstackgerritJeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers  https://review.opendev.org/75849615:19
*** lpetrut has quit IRC15:26
fungiugh, static sites are slow again. i wonder if they're hitting another vhost on there now15:31
fungii can't get the mod_status details to load15:32
fungichecking cacti graphs15:32
fungioh yeah, graphs show it's getting slammed again15:33
clarkblooks like tarballs.opendev.org15:34
fungiaha, ansible undid our filter on the tarballs site. adding it back manually15:34
fungiyeah15:34
clarkbdid ansible maybe undo our fix15:34
fungivhost config was modified 07:07 utc today15:34
clarkbon the mirror apache config cleanup the test that is failing is for dockerv1 api15:35
clarkbdockerhub has completely depreacted that (no pulls) since middle of last year15:35
clarkbI think we should just rm that proxy15:35
fungiback in place now15:35
fungishould clear up in a moment15:35
clarkbI'll put a cleanup change for that under my ordery deny allow satisfy cleanup15:35
clarkbfungi: when static settles can you look at the root screen on review test and check that my debugging of the manage-projects noop makes sense?15:36
clarkbif so I'll do that file copy and we can rerun manage-projects there15:36
fungiclarkb: yeah, i concur the cp command you have there makes sense given how the files are being mapped into the container15:36
clarkbk I'll do that cp now15:37
fungishould have fresh check results in for topic:ua-filter shortly and then hopefully we can speedily review it so ansible doesn't undo things again15:37
clarkb++15:38
clarkbdoes the manage projects command there look good and ready to go (in the root screen)?15:38
fungiyeah, lgtm15:38
clarkbhttps://review-test.opendev.org/admin/repos/clarkb/clarkb-test-project exists and https://review-test.opendev.org/gitweb?p=clarkb/clarkb-test-project.git;a=blob;f=.gitreview;h=0a41cd5553131a0ba126aed8644620c6ceab4713;hb=00fa933c012414a2f05fb7cf2a9c12a620aa9e79 lgtm15:40
clarkbalso it tested the change of default branch to main :)15:40
clarkbI think we can cehck that off now as working. \o/15:40
fungiyep, perfect!15:41
openstackgerritClark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives  https://review.opendev.org/75846915:44
openstackgerritClark Boylan proposed opendev/system-config master: Remove docker v1 registry proxy from our mirrors  https://review.opendev.org/75858515:44
clarkbnext I'll be testing a rename. Which I think the process is stop gerrit, move git repo, start gerrit and trigger online reindex15:45
*** hamalq has joined #opendev15:45
clarkbwill push up a change before I do that otherwise we can't validate reindexing15:45
fungiyup. i guess comment out the tasks for the gitea hosts15:45
clarkboh I wasn't going to use the playbook15:46
clarkbbut maybe I should15:46
clarkbI think we essentially just rm all the extra bits in a rename since the renames already do a mv and reindex?15:47
fungimm, system-config-run-static and system-config-run-gitea did not like my filter changes15:48
fungii'll dig deeper15:48
fungiDestination directory /etc/apache2/conf-enabled does not exist15:49
fungii guess the roles creating our apache tuning file make that dir15:49
clarkbI believe that apache2.conf loads from that dir odd that the package wouldn't mkdir the dir if it is trying to load from it in its config15:50
fungiahh, we install apache in those other roles so it makes that dir for us15:51
fungii guess i should copy apache installation and module enablement into this role too15:51
fungior reconsider the choice to add it to the playbook and include it in the other roles instead. roles including other roles just seems like spaghetti to me15:52
fungibut maybe it's preferable to duplicated tasks15:53
clarkbit should largely noop if we duplicate the steps and that way you don't have to do prework15:53
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site  https://review.opendev.org/75849515:54
openstackgerritJeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers  https://review.opendev.org/75849615:54
fungiokay, now the parent change also installs apache and enables mod_headers and mod_rewrite15:55
clarkbI think I'll just do the testing by hand on review-test for now. Mostly concerned about whether gerrit can do it more than automating the specific process at this point16:02
clarkb(mostly because its more straightforward and this week was a bunch of fires and I want to make progress)16:03
clarkbhttps://review-test.opendev.org/c/clarkb/clarkb-test-project/+/755651 I had to manually add defaultbranch to the gitreview settings to chagne it from master. I'll make a jeepyb change to set that too I guess16:07
clarkbthe problem with this testing is my todo list only gets longer :P16:07
fungithat is a general problem with todo list, in my experience16:09
fungii've never had one get shorter16:09
clarkbI've renamed the project and confirmed that pre reindex you can't rely on the change number redirected urls16:12
clarkbif you construct a url with the new project name in it that works16:12
clarkbchange reindex is running now and we'll have to wait and see if that fixes the change number url redirects (I expect it will)16:13
clarkbhrm looking at the rename playbook we update project watches in the db. I guess I should test that with a rename too16:15
*** marios is now known as marios|out16:16
openstackgerritJeremy Stanley proposed opendev/system-config master: Update static Apache configs to 2.4 ACL primitives  https://review.opendev.org/75859316:17
fungiclarkb: ^16:17
fungihopefully that's not going to break anything16:17
*** marios|out has quit IRC16:18
openstackgerritClark Boylan proposed opendev/system-config master: Post gerrit upgrade rename playbook  https://review.opendev.org/75859416:22
openstackgerritClark Boylan proposed opendev/jeepyb master: Set default branch in .gitreview files when creating project  https://review.opendev.org/75859516:26
fungiargh, new error16:30
openstackgerritClark Boylan proposed opendev/jeepyb master: Make local git dir creation optional  https://review.opendev.org/75859716:33
fungihuh, ansible-lint takes a lot longer than i realized on our system-config repo16:34
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site  https://review.opendev.org/75849516:39
openstackgerritJeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers  https://review.opendev.org/75849616:39
fungiforgot you need mod_macro to make use of macro directives16:40
clarkbwe consume jeepyb from source not releases right?16:40
clarkbI think that is the case16:40
fungiyes16:40
fungiwe haven't ever tagged jeepyb have we?16:40
fungiyeah, no tags16:41
openstackgerritClark Boylan proposed opendev/system-config master: Stop managing gerrit's local git mirror dir  https://review.opendev.org/75859816:42
openstackgerritClark Boylan proposed opendev/system-config master: Clean up cron tab entry from ansible once removed from host  https://review.opendev.org/75859916:42
clarkbhttps://review.opendev.org/758595 https://review.opendev.org/758597 https://review.opendev.org/758598 and https://review.opendev.org/758599 should all be fine to land on 2.13 (but as always careful review is appreciated)16:43
fungiaverage runtime for tox-linters on our system-config repo is 20 minutes according to zuul's stats16:45
clarkbreindexing changes was suspiciously quick, I assume they made it faster when it could infer things were already at the right index?16:46
fungii had heard it was faster in new gerrit16:46
clarkbhttps://review-test.opendev.org/755651/ redirects properly now though. I'm going to watch the project and star the change, then rename it again and check that those elements update properly16:47
fungiwatches are still in the rdbms right?16:48
fungino, wait, that was file reviews16:48
fungiso watches are in notedb now?16:48
clarkbya I beleive so16:49
clarkbI just don't know where exactly  which is why I want to test them with renames16:49
*** hamalq has quit IRC16:49
fungisure16:49
clarkbif they aren't in the project being watched then we may have to edit some git repos somewhere All-Users or All-Projects likely16:49
clarkbI did find a mailing list thread from luca indicating that repo mv and reindex is all that should be necessary so I am hopefully this is all sorted out16:51
clarkbpre reindex my watch hasn't updated16:52
zbrfungi: if you call ansible-lint with xargs or use an outdated version, you will see bad performance.16:54
clarkbaccounts have reindexed and my watch hasn't updated. I suspect that these are stored in a central repo16:55
clarkbwill double check after changes reindex in case that is somehow a fix to the problem16:55
clarkbfungi: on the UA filter change, you have two handlers one for reload and one for restart. You use restart even though loading the mod rewrite rules should happen on reload? Is that due to the addition of the modules?16:57
fungiclarkb: yes, module addition needs a restart16:57
fungioh, though i guess i could just add that on the module tasks16:58
clarkbfungi: ya I wonder if we should have module tasks do a loop then notify restart then the file change can notify reload16:58
clarkbthen if any module changes we restart and if the config changes we reload16:58
*** hamalq has joined #opendev17:14
*** hamalq has quit IRC17:15
*** hamalq has joined #opendev17:15
*** mlavalle has joined #opendev17:16
clarkbchanges have finished reindexing and the star is carried over but the watched project is not17:27
clarkbis it terrible that I'm initially thinking: people can just update their watches :P17:27
clarkbits been a long week. I'll dig into the notedb to see where that is stored soon17:27
fungiinfra-root: rackspace opened a ticket to let us know there's a disruptive database maintenance scheduled for ~2 weeks out impacting our "testmt" trove instance in dfw... looks like it was one set up to test percona clustering. doesn't seem like we're using it for anything, can i delete it?17:29
fungimordred: ^ if you happen to be around, you might know17:30
clarkbok All-Users:refs/userse/XY/ABXY/watch.config is where watches go now. ABXY is your account id number17:30
fungiouch17:31
fungithat's not going to be easy to update17:31
fungibasically have to check out the user-specific ref for every single user to check its watches, then push commits for any which match?17:32
clarkbya something like for id in user_ids: git fetch refs/users/id[:-2]/id && git checkout FETCH_HEAD && sed -i -e 's/"oldprojectname"/"newprojectname"/g' && git commit -a -m "rename project oldname to project newname" && git push gerrit refs/users/id[:-2]/id17:33
clarkbmaybe with a if git diff in the middle to check if we updated anything17:34
fungiand we have how many thousand users to traverse?17:34
clarkb36k ish17:34
fungigit checkout is not exactly fast17:34
fungiyeah, not happening17:34
fungipeople can re-set their watches when projects rename, i guess17:34
* fungi doesn't set his watch since he never wears it17:35
clarkbya I think we can basically say thats a known issue for now and if someone can make it fast they win17:35
fungigit can cat a file from an arbitrary ref, so maybe no need to checkout unless we find a match17:36
clarkbmy earlier statement wasn't super clear the ref is refs/users/XY/ABXY and then in that working tree is a watch.config17:36
clarkbfungi: ooh good idea17:36
clarkbthere may also be a gerrit api that we can ask for a list of users to modify (or even to do the modification for us)17:37
clarkbthere are rest apis for this17:40
clarkbthat may not be faster but is likely going to be easier to reason about?17:40
clarkbhttps://gerrit-review.googlesource.com/Documentation/rest-api-accounts.html#set-watched-projects17:40
clarkb36k get requests then some smaller number of deletes and adds17:41
clarkbI expect this is solevable but probably also no necessary before we upgrade17:41
fungiclarkb: ianw: topic:ua-filter changes are passing tests now17:42
clarkbfungi: do you want to update the base chagne to differentiate between when it should reload and when it should restart?17:42
clarkbor do you want to land that as is and do a followup?17:42
fungiclarkb: oh, yeah i can give that a shot17:42
*** andrewbonney has quit IRC17:43
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site  https://review.opendev.org/75849517:45
openstackgerritJeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers  https://review.opendev.org/75849617:45
fungiclarkb: also not sure if you saw, but i pushed up 758593 to remove the satisfy all directives from the static site vhosts17:52
clarkbOh I hadnt will review both sets shortly17:53
fungialso i think we've got more bitrot in pbr jobs17:55
fungilooks like my changes which had been passing tempest-full and pbr-installation-openstack-pip-dev jobs started failing them a few weeks later once they were approved17:57
clarkbpbr not working on focal for some reason maybe?17:58
fungii think so... the tempest job seems to be failing to install python-guestfs (it should probably be using python3-guestfs instead)17:58
fungithe pip-dev job seems to be hitting a test timeout on pbr.tests.test_integration.TestIntegration.test_integration(octavia)17:59
clarkbthose integration tests install the various projects. That could be a regression in pip if its really slow18:00
clarkbfungi: all the apache chagnes lgtm18:01
fungithanks18:01
fungiahh, also timing out on pbr.tests.test_integration.TestIntegration.test_integration(nova)18:02
clarkbfor the timouts I would try increasing the test timeout and see if it finishes in a reasonable amount of time. If not we may have to dig into a potential regression in performance there18:06
fungiclarkb: looking closer, there's a tempest-full (which is failing to install python-guestfs instead of python3-guestfs on ubuntu-focal) and a tempest-full-py3 which is working18:08
fungii bet tempest-full should not have been moved to focal18:08
fungibecause it's doing python 2.7 testing18:08
clarkboh ya I think tehre is a change up to drop the python2 test18:09
clarkbI suggested that it be replaced with a train python2 job instead18:10
clarkbsince really the intent there is to be able to continue to support installations for supported openstack versions with pbr18:10
clarkbthere was a realted change to switch to pre commit which I reviewed18:10
fungiit looks like the tempest-full job should never have been moved to focal though... it declares "USE_PYTHON3":false18:11
clarkb++18:11
openstackgerritClark Boylan proposed opendev/system-config master: More old apache acl cleanups  https://review.opendev.org/75861118:16
clarkbfungi: ^ more apache cleanups18:16
fungithanks! i saw those in there too and figured they should be a separate change18:16
clarkbya gerrit is the last one with the old stuff that I see18:17
clarkbbut need more investigating to understand what those Directory blocks are even doing there18:18
clarkbI think we can just rm them all18:18
fungimaybe we can see if it breaks review-test and then just roll that into the upgrade changeset18:23
clarkbya that may be the safest route18:27
mordredclarkb: I am 100% sure that anything labeled "testmt" can be deleted18:38
clarkbfungi: ^18:38
fungimordred: thank you!!! deleting now18:41
fungi#status log deleted trove instances of "testmt" percona ha cluster from 2020-06-2718:46
openstackstatusfungi: finished logging18:46
fungiis there a way to override the default branch for all requires-projects but not for the project triggering the build?18:56
fungispecifically this is for trying to run tempest-full on pbr changes (pbr has only a master branch) but with stable/train of all the openstack projects (so as to be able to run with python 2.7)?18:57
clarkbfungi: I think pbr will fall back ti master asit doesnt have stable branches18:57
clarkbbutmaybe that will error instead18:57
fungiokay, so just declaring override-branch: stable/train will work maybe18:57
fungitrying that now18:57
fungigmann is suggesting to do override-checkout instead, curious now to try both and compare the results19:05
gmanni think override-branch just do running repo branch with overridden and rest all (including devstack) form master ? I partially remember this when ironic facing the issue when their stable branhc job on master gate were doing override-branch instead of override-checkout19:07
fungiyeah, i suspect you're right19:10
fungioverride-branch is basically doing the inverse of override-checkout for this19:10
clarkbfungi: shoudl we take the friday opportunity to land some of these apache changes and get them checked out / reverted if necessary?19:26
clarkbI expect we'd have to wait for ianw's monday to get a second reviewer on them otherwise19:26
fungiyeah, may as well. i'll take a quick look through yours while dinner finishes cooking19:28
clarkbshould I approve yours or do you want to do those in the same pass?19:29
fungiclarkb: 758585 is failing ci19:30
clarkblooking19:32
fungifeel free to approve my stack, i already approved your misc one19:33
clarkbit failed to restart apache19:33
fungiwhich usually means a config error19:33
clarkbmaybe I need to remove my listen directives?19:34
fungilooking19:34
clarkbbut the child change passed testing and ran the same job. I'll remove the listen directives for completeness but not sure thats related19:34
openstackgerritClark Boylan proposed opendev/system-config master: Remove docker v1 registry proxy from our mirrors  https://review.opendev.org/75858519:35
openstackgerritClark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives  https://review.opendev.org/75846919:35
* clarkb goes to approve fungi's 3 changes19:35
clarkbthough maybe we should approve the UA ones first then apply satisfy removal over the top19:36
clarkbsince we don't want to remove the ua filtering if satisfy removal merges and the other doesn't19:36
clarkbya I'll do it that way, the UA changes are approved once those land I'll approve the satisfy removal19:36
*** tosky has quit IRC19:37
fungithe apache error log for that host didn't have anything of note in it19:38
fungii want to say we've seen a similar potential race or other flakiness around letsencrypt where the cert file winds up not getting created, and the apache restart failure is a cascade error19:39
*** tosky has joined #opendev19:39
*** slaweq has joined #opendev19:39
fungiwe're basically calling on acme to do public api operations, it's not mocked19:39
*** priteau has quit IRC19:40
clarkbI just sent a followup to infra root and luca about gerrit things we've learned in case luca had input on any of them19:54
openstackgerritMerged opendev/system-config master: More old apache acl cleanups  https://review.opendev.org/75861119:56
fungithanks!19:59
openstackgerritMerged opendev/system-config master: Block restricted user agents for the tarballs site  https://review.opendev.org/75849520:06
*** slaweq has quit IRC20:12
clarkbcacti's apache config just updated and I can still get graphs so nothing broke there20:20
clarkbfungi: tarballs should be running nowish20:27
clarkbif you confirm its happy when done I'll approve the static Satisfy any cleanup20:27
clarkbit seems to have applied at least, not sure if functional yet20:33
clarkbhttps://tarballs.opendev.org/ still loads anyway20:33
*** roman_g has quit IRC20:33
*** roman_g has joined #opendev20:34
clarkbtailing the apache logs I don't see any of the sad making UAs20:35
clarkbgoing to call that good and approve the Satisfy any cleanup20:35
fungiyeah, it should have been a no-op since the change was already manually applied20:36
fungiindeed, it corrected a trailing space on one line of the ua-filter.conf and didn't modify the vhost config at all20:38
fungiso looks good20:38
*** roman_g has quit IRC20:38
openstackgerritMerged opendev/system-config master: Use the apache-ua-filter role on Gitea servers  https://review.opendev.org/75849620:42
openstackgerritMerged opendev/system-config master: Remove docker v1 registry proxy from our mirrors  https://review.opendev.org/75858520:42
openstackgerritMerged opendev/system-config master: Update static Apache configs to 2.4 ACL primitives  https://review.opendev.org/75859321:06
*** tosky has quit IRC22:29
*** qchris has quit IRC22:41
*** qchris has joined #opendev22:54
*** mlavalle has quit IRC23:27
fungiclarkb: 758612 seems to be working for keeping tempest-full on pbr but running it with stable/train of projects23:51
fungithat should get us back to being able to merge pbr changes again. i think the test timeouts i saw in the pip-dev job were signs of a "slow node" but we should keep an eye out23:52
clarkbI'll take a look23:52
clarkblgtm, any idea if anyone elseis reviewing those?23:57
fungii've commented on the change for dropping the tempest-full job, and also tried to drum up interest in #openstack-oslo, so someone may notice23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!