Monday, 2024-01-29

fricklertonyb: not sure about the status of inmotion cleanup, according to grafana there's still 28 nodes stuck in deleting?09:10
*** tosky_ is now known as tosky10:19
opendevreviewJan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchiestrator, exn-middleware, overlay-network-agent  https://review.opendev.org/c/openstack/project-config/+/90706013:48
jrosseri know not much can be done but looks like connectivity between me and opendev.org is pretty terrible again15:25
jrosserand mtr points to above.net / zayo as culprit again15:26
fungijrosser: ipv4, v6, both?15:32
jrosserah i'm not on a dual stack vm right now15:33
* jrosser tries something else15:33
* frickler still has no v6 connectivity to opendev.org. (or again? I keep loosing track. likely should add that topic to preptg agenda, too)15:35
jrosseri see this for v4 https://paste.opendev.org/show/bVNawMTqNZJenQUqe40F/15:36
jrosserthough i'm not sure that actually reports end to end loss though15:36
fricklerwell end to end loss is in the final line, which says 0%, so that looks fine15:38
jrosserit's the same symptom that i've seen before of only kbit/s throughput on git things15:46
fricklerjrosser: here's an mtr in the opposite direction to the first public IP in your trace. looks like there might be congestion between zayo and bbc https://paste.opendev.org/show/bVonvvCAVp1gj1U8Sk5x/16:02
opendevreviewTim Burke proposed opendev/git-review master: Add classifiers for Python 3.10 and 3.11  https://review.opendev.org/c/opendev/git-review/+/90709716:41
fricklerseems we have a regression in linting for git-review, failure unrelated to the patch afaict https://zuul.opendev.org/t/opendev/build/6fbff753824c418aa26d173d5ffefb1316:45
corvusClark: i'm working on the skopeo thing16:50
clarkbcorvus: ok, is it more involved than simply using a newer client?16:55
clarkbhrm I guess the change that tried that failed too16:55
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: ensure-skopeo: use make install  https://review.opendev.org/c/zuul/zuul-jobs/+/90710016:57
corvusclarkb: i think that's the next mole to whack ^16:57
corvusif works, will need to squash16:57
clarkbI guess that installs to /usr/local/bin by default?16:58
opendevreviewTim Burke proposed opendev/git-review master: Use importlib.metadata instead of pkg_resources  https://review.opendev.org/c/opendev/git-review/+/90710117:00
corvusclarkb: yep, verified in a local build17:01
opendevreviewMerged openstack/project-config master: Implement ironic-unmaintained-core group  https://review.opendev.org/c/openstack/project-config/+/90279617:09
opendevreviewTim Burke proposed opendev/git-review master: Fix flake8 issue  https://review.opendev.org/c/opendev/git-review/+/90710217:10
timburkefrickler, thanks for letting me know; ^^^ should address17:11
opendevreviewJan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchiestrator, exn-middleware, overlay-network-agent  https://review.opendev.org/c/openstack/project-config/+/90706017:13
opendevreviewJeremy Stanley proposed opendev/system-config master: Retire the OpenInfra Labs mailing list  https://review.opendev.org/c/opendev/system-config/+/90710317:16
clarkbfungi: where did we end up wtih keycloak?17:26
clarkbfungi: I think if othes don't object renaming the database files and mounting them in the new locations seems reasonable. I guess make backups of the files too17:26
fungiclarkb: yeah, that's what i'm doing, basically17:28
fungialso dawned on me that i should double-check the ownership and permissions inside the container17:28
clarkboh ya those may have changed too. Fun17:28
fungidoes mirror02.iad3.inmotion need to remain in the emergency disable list, or is it working again?17:29
fungii seem to be able to ssh into it17:29
fungijust want to be sure before i re-break our base deploy17:29
clarkbI think it is happy after tonyb did surgery on that cloud17:29
clarkbbasically had to restart rabbitmq properly. We tried a naive approach whihc didn't work but there was a kolla method that was better and got things working again17:30
fungiokay, i've taken it back out but we should keep an eye on the deploy jobs17:30
clarkb++17:31
fungii also added keycloak01 to the disable list while i'm working on potential compose file edits17:31
clarkbcorvus: fwiw I checked snapcraft and microk8s latest/stable hasn't updated since they broke things. No new release yet. The bug hsa gotten some references added to it so maybe it will get fixed17:32
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Run buildset registry test jobs on ensure-skopeo changes  https://review.opendev.org/c/zuul/zuul-jobs/+/90710417:33
clarkbcorvus: which means we may want to go ahead with the nodepool change for now then swing around and cleanup zuul-jobs17:34
corvuswhich nodepool change?17:34
corvushttps://review.opendev.org/90690517:35
clarkbyup that one. That change is what started my journey on friday :)17:35
clarkbtrying to land the ssh keyscan change17:35
clarkbcorvus: https://review.opendev.org/c/zuul/zuul-jobs/+/906907 was next then https://review.opendev.org/c/zuul/zuul-jobs/+/90691617:36
corvusyeah, i got those but missed the rollback one :)17:37
corvusi think i missed it cause i started at the end and worked backwards17:37
clarkband then finally the ssh keyscan change you wrote is at the very beginning17:38
corvusthanks for digging into that :)17:40
clarkbyou're welcome17:41
opendevreviewMerged opendev/git-review master: Fix flake8 issue  https://review.opendev.org/c/opendev/git-review/+/90710217:51
opendevreviewTim Burke proposed opendev/git-review master: Use importlib.metadata instead of pkg_resources  https://review.opendev.org/c/opendev/git-review/+/90710117:52
clarkbit took me a second to udnerstand why ^ is necessary since git-review depends on setuptools to install. Except it doesn't if you install from a wheel...18:05
timburkeclarkb, bingo -- idea was to have it as an alternative to https://review.opendev.org/c/opendev/git-review/+/898839?usp=search18:31
fungimmm, i'm getting increasingly out of my depth with the keycloak migration experiment... container log is now full of errors about failing to start the server because it can't establish a jdbc connection, reporting "ERROR: Wrong user name or password [28000-197]"18:45
fungii guess h2 databases have some sort of integrated credentials?18:45
clarkbI didn't think they did. I thought it was more like sqlite. But maybe I'm wrong about that18:46
fungihttps://stackoverflow.com/questions/63800413/invalid-username-password-when-accessing-keycloaks-h2-database18:47
adamcarthur5I've seen that in some cases people are adding a comment that re-runs the CI to check if it is a fluke. What is the best way to do this myself?18:47
clarkbheh that even points at keycloak18:47
fungiadamcarthur5: start a review comment with the word "recheck" but ideally only do that after you've looked into the failure and feel confident it's not being caused by your change18:48
adamcarthur5Okay great, thank you :))18:48
clarkbyou can also append any text after "recheck" if you want to add notes around why you are rerunning things18:48
JayFYes, please do not run `recheck` with some reasoning/explanation afterwards.18:49
JayFeven `recheck the code under test is not changed by this patch` is good as a minimum bar 18:49
fungiyes, i generally "recheck because ..." (some summary of the nature of the job failure and why i know it's unrelated)18:49
clarkbfungi: things like that are making me think maybe we need to consider a mariadb and start over on config18:49
clarkbjust because we know how to work with a mariadb and configuratioon of that is a bit more explicit18:49
clarkbI dunno this feels like a framework migration where they never considered h2 users because that isn't a "production" deployment so doesn't matter18:50
JayFadamcarthur5 the ideal case is that, as a contributor to $project, you'd troubleshoot the random failure as fix it like some kinda coding superhero :D. The reality is as an early contributor, the best thing you can do there is just read the logs, make sure you start recognizing if patterns emerge in failures, and if so raise a question in IRC or on mailing list18:50
adamcarthur5JayF yeah definitely. I'll just take it project by project and see how I get on :))18:51
clarkbfungi: the only other idae I've got is if we can determine the old h2 db creds we can probably configure quarkus keycloak to use those creds18:52
JayFYeah the entire point is twofold: 1) each `recheck` comment is expensive, in terms of actual-computers-being-run, and 2) we do not want to get to a point where we have a periodic failure, say, every 5-10% of the time it fails randomly, get into the codebase because we "rechecked" to get a green18:52
clarkbfungi: also why you would chnge those at all is beyond me. It only serves to make users' lives miserable18:52
clarkbfungi: KC_DB_USERNAME and KC_DB_PASSWORD seem to be what you use to set things for postgres etc. Maybe we can set those values and have them work with h218:53
fungiclarkb: yeah, found this in an upgrade note too: "Keycloak ships for development purposes with an H2 database driver. As it is intended for development purposes only, it should never be used in a production environment. In this release, the H2 driver has been upgraded from version 1.x to version 2.x. This change might require changes to the H2 JDBC URL or migration of the H2 database files in18:53
fungian existing Keycloak setup. [...] Purge existing H2 database files to start with an empty database, export and import the realms using Keycloak’s export and import functionality, or refer to the migration notes on the H2 database project’s website for details on how to migrate H2 database contents."18:53
fungithat's for 20.0.0 i think, but reflects their sentiments fairly well18:54
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Run buildset registry test jobs on ensure-skopeo changes  https://review.opendev.org/c/zuul/zuul-jobs/+/90710418:54
clarkbfungi: ya basically they don't care about people using h218:54
clarkbfungi: https://stackoverflow.com/questions/72863453/unable-to-access-keycloak-18-0-2-embedded-h2-db-file this has what may be the defaults we used18:54
clarkbfungi: we could try forcing those values here. Otherwise maybe we consider a revert and plan for a proper db to avoid problems in the future?18:54
clarkboh wait those valuse are for quarkus not wildfly18:55
clarkbbut we might be able to look in the same files and go back in time to find the old values?18:56
corvusclarkb: fungi if our actual goal is to eventually proxy to other providers, then we could use h2 and reconstruct the contents via the api on every deployment.18:56
clarkbcorvus: we'd still need the user generated content though18:56
fungiyeah, it looks like the container change included switching from wildfly to quarkus18:56
corvus(or of course, use a real dbms; just mentioning the option)18:56
clarkbfungi: yes that is the difference between legacy and not legacy18:57
clarkband it seems like they didn't put a lot of effort into making that transition easy18:57
clarkbI mean they could've copied the db from the old location to the new one and continued to use the same credentials18:57
fungicorvus: yeah, i think that's where we're headed. the main question i have is whether we should roll back the container change temporarily, and whether we need to include steps to export the h2 realm data from h2/import it into maraidb or want to start with a clean slate18:58
fungiso two questions i guess18:58
corvusclarkb: oh i guess we would accumulate mappings, wouldn't we?18:58
clarkbcorvus: yes exactly. Over time users would be building out the db telling keycloak what backend identities they have for the frontent of opendev id (for lack of better terminology)18:58
corvusthen i agree dbms is the right long-term answer; just a question of whether we want to take that hit now or later18:59
funginot trying to keep our old data would only impact maybe half a dozen users, but i'm open to doing the extra export/import if folks want19:00
fungialso since we only have keycloak hooked up to one system for now and it's not in a critical path for anything, it seems like this would be the time to switch it to a proper sql server. that will in theory greatly simplify the container version updates we want to do once we're on the "new style" image, but also means less data to lose or migrate if we do it before working on tying it to more19:01
fungisystems19:01
clarkbya I think it also simplifies working with that data19:02
clarkbsince we know how to talk to a mariadb (or even a postgres) but h2 is a bit more complicated19:02
opendevreviewJeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container"  https://review.opendev.org/c/opendev/system-config/+/90711919:12
fungioh funky... default behavior of git revert has changed! now if you revert a revert, the subject changes to 'Reapply "whatever the reverted commit subject was"' rather than 'Revert "Revert ..."'19:14
clarkbyup. I left a note in there about keeping the new test as I think that is valuable19:14
fungithe test may need altering because of the change to the url, but i can do that. or i can reintroduce the test as a separate change before the reapply19:15
fungipreference?19:15
clarkbya you need to add the /auth/ prefix (noted that in my comment)19:16
clarkbI'm fine with a followup change to the revert19:16
fungialso i've undone my test changes to the db files (wiped both paths we're mounting into the container and replaced with their content with the local backup copies i made)19:18
fungionce we merge a revert, i'll take the server back out of the disable list19:18
clarkbsounds like a plan19:18
opendevreviewJeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container"  https://review.opendev.org/c/opendev/system-config/+/90711919:21
fungiclarkb: something to consider... if we don't care about exporting/importing, maybe it would make more sense to make the jump to the latest keycloak lts version instead of incrementally upgrading?19:28
clarkbfungi: ++19:30
fungihuh, actually they don't do "lts" (unless you count red hat's commercial version), our choices are between nightly branch tips, latest release version, or pinning a major (or minor/patch/build) version19:32
fungiat the moment, 23.0.5-0 == 23.0 == latest19:33
fungiso we'd either do latest and live with potential surprises (like in a month when 24.0.0 is released) or 23.0 i think and then keep on top of updates for it19:34
fungii'll start conservatively with 23.0 and see what happens19:34
clarkbI think we did lates previously19:35
clarkband then latest switched to quarkus and broke so we switched to legacy and that updated a few releases without trouble19:35
clarkbthe main risk is probably if they do another big shift like quarkus19:35
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0  https://review.opendev.org/c/opendev/system-config/+/90714119:55
fungihttps://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/keycloak01.opendev.org/docker/keycloak-docker_keycloak_1.txt#41 says "Listening on: http://127.0.0.1:8080" at 20:15:22 but then testinfra fails at 20:17:15 saying that it wasn't https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/job-output.txt#1729420:44
fungihost networking reaching the container?20:45
fungiit's definitely odd, because it *was* listening on 8080/tcp in the prior change when it was still using the h2 backend20:45
fungirather, the test that it was listening passed20:46
opendevreviewJeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container"  https://review.opendev.org/c/opendev/system-config/+/90711920:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0  https://review.opendev.org/c/opendev/system-config/+/90714120:47
fungistill trying to get the new test to pass on the revert, but also i've set an autohold for the upgrade change20:48
clarkblooking at db stuff for keycloak they don't do utf8mb4...21:20
clarkbthat won't explain your test failures, but wow in 2024 we still can't get reliable utf8 from mysql/mariadb21:20
clarkbfungi: looking at the mariadb log https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/keycloak01.opendev.org/docker/keycloak-docker_mariadb_1.txt I don't see it logging any connections21:22
clarkbI don't know that it would necessarily, but maybe we aren't connecting properly and the keyclaok is dying?21:22
fungii would have expected to see that in its log, which i didn't. but hey, who knows21:25
fungii'll know more once the held node is there21:26
fungiheld node is 217.182.142.6021:27
fungiit's still running21:27
funginetstat indicates ther's something listening at 127.0.0.1:808021:27
fungii get html back from it too, connecting over the loopback on the host21:28
fungiwhat would the is_listening assert be checking if not that?21:31
clarkbI would expect it to do the equivalent of a netstat or ss for that port and ip21:32
fungiwhich is what i did first, and it also listed the socket in a listening state (netstat -lnt)21:41
fungihrm, though it shows up under tcp6 and not tcp21:42
fungitcp6       0      0 127.0.0.1:8080          :::*                    LISTEN21:42
fungibut i was still able to query http://127.0.0.1:8080/ with wget, no ptoblem21:43
clarkbmaybe change the tcp:// to tcp6:// in the test?21:43
clarkbI wonder if that is filtering too aggressively21:44
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0  https://review.opendev.org/c/opendev/system-config/+/90714121:45
fungiworth a shot21:45
clarkbstarting on tomorrow's meeting agenda. I'll add notes about keycloak and remove the zuul db item21:58
clarkbI guess I should look at the centos wheel builds to catch up on that21:58
clarkblooks like the publish jobs are working now but not the release jobs22:00
clarkb'VLDB: no such entry' and "afs_volume": "mirror.wheel.cent8a64"22:01
clarkbI see the issue22:04
opendevreviewClark Boylan proposed openstack/project-config master: Fix wheel_volume values for centos stream wheel mirrors  https://review.opendev.org/c/openstack/project-config/+/90715022:10
fungiclarkb: no dice... "Cannot validate protocol 'tcp6'. Should be tcp, udp or unix"22:23
clarkbfungi: I hate this next idea but maybe we rewrite the test to do an http fetch which if successful implies the socket is listening22:24
clarkbthe other thought is to make sure the correct server is being tested22:24
fungior delete the test entirely since we now have more thorough tests anyway?22:24
clarkbya22:25
fungiit had a purpose back when we didn't actually interact with the service in any tests22:25
clarkbit is weird that it is failing though22:25
fungioh, the existing tests don't connect to the listener from the container directly though, they go through the apache reverse-proxy22:26
fungibut regardless, i expect they'd fail if it wasn't up and listening22:26
clarkband hte container uses host networking so the distinction there isn't meaningful. I meant more is the testinfra node correct22:27
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0  https://review.opendev.org/c/opendev/system-config/+/90714122:27
fungii've got a new autohold set for it too22:28
fungiin part, i'm curious to see if the subsequent tests actually work even though that one was failing22:29
fungioh, in fact they did pass in previous builds too22:31
fungiso only the is_listening check for 8080/tcp on localhost was failing, but the proxied interactions (using the admin test creds to issue a token and then making an admin-only api call with that token) through apache's proxy were fine22:32
clarkbyup it has to be something to do with the test but whatever it is isn't very obvious22:34
clarkbI've updated the meeting agenda. Anything else to add?22:39
fungilooks like it relies on either ss or netstat depending on what's available: https://github.com/pytest-dev/pytest-testinfra/blob/main/testinfra/modules/socket.py#L212-L31222:40
fungidefaults to ss if present, so i'll test with that on the next held node to see if i can spot the problem22:40
fungiit seems to do some rather fragile parsing of ss output too22:49
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM: Fail keycloak testing for an autohold  https://review.opendev.org/c/opendev/system-config/+/90660023:36
fungiforgot the autohold wasn't going to do much good if i got the change to a passing state23:40
clarkbthat looks like a noop rebase though/23:41
clarkbbut I think it will fail23:41
fungii restored the dnm change i was using before and then cherry-picked it onto the passing change23:41
clarkbah23:42
fungiso autohold is now set on the dnm change that appends an always-failing test23:42
fungithen i'll step through the testinfra.modules.socket.LinuxSocketSS._iter_sockets() and see what it comes back with23:44
fungilooks like it should to `ss --numeric --listening` and then check all the returned lines for a match23:49
fungiwow, that method has a variable named "local"23:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!