Monday, 2023-02-13

opendevreviewJeremy Stanley proposed openstack/project-config master: Farewell limestone  https://review.opendev.org/c/openstack/project-config/+/87347500:07
opendevreviewJeremy Stanley proposed openstack/project-config master: Remove empty limestone nodepool providers  https://review.opendev.org/c/openstack/project-config/+/87347600:07
Clark[m]fungi: small thing wiht the first chngae in that stack00:15
opendevreviewJeremy Stanley proposed openstack/project-config master: Farewell limestone  https://review.opendev.org/c/openstack/project-config/+/87347500:30
opendevreviewJeremy Stanley proposed openstack/project-config master: Remove empty limestone nodepool providers  https://review.opendev.org/c/openstack/project-config/+/87347600:30
fungioops, thanks!00:30
fungiand yeah, i spotted the git repo after i had deleted files that mentioned limestone in their name, forgot to go back and double-check those weren't related, readded the one now00:31
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: increase ssh channel debugging  https://review.opendev.org/c/opendev/system-config/+/87321403:23
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: send logs to syslog  https://review.opendev.org/c/opendev/system-config/+/87348004:29
opendevreviewIan Wienand proposed opendev/system-config master: logrotate: don't use filename to generate config file  https://review.opendev.org/c/opendev/system-config/+/87348104:29
opendevreviewIan Wienand proposed opendev/system-config master: logrotate: remove old log workaround  https://review.opendev.org/c/opendev/system-config/+/87348204:29
*** yadnesh|away is now known as yadnesh04:31
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: increase ssh channel debugging  https://review.opendev.org/c/opendev/system-config/+/87321404:47
*** ysandeep is now known as ysandeep|lunch08:04
*** ralonsoh_ is now known as ralonsoh08:37
*** jpena|off is now known as jpena08:54
ianwclarkb: something weird is going on09:16
ianwhttps://review.opendev.org/c/opendev/system-config/+/873214/9/playbooks/zuul/bootstrap-test-review.yaml sets " gerrit logging set-level DEBUG com.google.gerrit.sshd"09:17
ianwi can see in https://878773ae5f0379c9d548-8605f807f0b1cdc197533279c15edd0d.ssl.cf1.rackcdn.com/873214/9/check/system-config-run-review-3.6/2d81c9f/bridge99.opendev.org/ara-report/results/389.html this has applied09:18
ianwwhen we ssh, stuff should come out on stderr09:18
ianwand end up in https://878773ae5f0379c9d548-8605f807f0b1cdc197533279c15edd0d.ssl.cf1.rackcdn.com/873214/9/check/system-config-run-review-3.6/2d81c9f/review99.opendev.org/docker/gerrit-compose_gerrit_1.txt09:19
ianwwe do not see anything09:19
ianwwhen i tried this quickly on review02, i saw all sorts of debug stuff coming out09:20
ianwit looks something like09:20
ianw[2023-02-13 03:31:07,893] [sshd-SshDaemon[4fc41cba](port=22)-nio2-thread-17] DEBUG org.apache.sshd.server.session.ServerSessionImpl : encode(ServerSessionImpl[null@/127.0.0.1:56142]) packet #6 sending command=60[60] len=30009:21
*** ysandeep|lunch is now known as ysandeep09:23
ianwhttps://opendev.org/zuul/zuul-jobs/src/commit/ff1836691e00bc79a70d55f94e9aa5584f0c2c67/roles/collect-container-logs/tasks/main.yaml#L18 collects the logs; that doesn't seem like it could be missing anything09:32
ianwOOOOHHHHHHH09:37
ianwwe stop it!09:37
ianwhttps://878773ae5f0379c9d548-8605f807f0b1cdc197533279c15edd0d.ssl.cf1.rackcdn.com/873214/9/check/system-config-run-review-3.6/2d81c9f/bridge99.opendev.org/ara-report/playbooks/7.html09:37
ianwwhen we rename repos09:38
ianwhttps://opendev.org/opendev/system-config/src/branch/master/playbooks/rename_repos.yaml#L809:38
ianw... therefore, when we dump the logs at the end of the testing run -- that instance hasn't had debugging turned up.  this makes sense, i thikn09:39
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: increase ssh channel debugging  https://review.opendev.org/c/opendev/system-config/+/87321409:42
ianw^ that rebases it on 873480 -- which saves the gerrit container logs to a file on disk.  that should capture both (and for upgrades, etc, all other stop/starts)09:43
opendevreviewMerged openstack/project-config master: Update StarlingX docs promote job for R8 release  https://review.opendev.org/c/openstack/project-config/+/87326609:50
*** soniya is now known as soniya|afk10:36
*** tweining_ is now known as tweining11:16
*** priteau_ is now known as priteau12:25
*** dasm|off is now known as dasm13:35
*** ysandeep is now known as ysandeep|out15:15
clarkbfungi: the limestone project-config changes lgtm now. Did we put the limestone mirror in the emergency file so that base can run before these land?17:15
clarkbI'm guessing not since we still got emails early this morning relative to me17:15
fungii hadn't, no, figured it wasn't urgent17:20
clarkbI mostly wanted to make sure we hadn't uncovered whatever the next issue is by doing so17:20
fungiand this way we know the mirror is still dead17:20
clarkbwfm and the stack lgtm too. Just need someone else to double check (though maybe that isn't super necessary for these changes)17:22
*** yadnesh is now known as yadnesh|away17:52
clarkbianw: the gerrit changes still don't seem to collect the logs we expect to see (not in error_log, sshd_log or the containers log via syslog). Maybe we aren't doing ssh things after set level?17:54
clarkbinfra-root: I've been thinking about replacing gitea servers recently. It looks like our haproxy configs for gitea and gerrit replication configs for gitea are separated from our gitea group membership. This means we should be able to deploy a new server and check that it configured gitea properly, then update the gerrit configuration to replicate to the server, trigger complete18:03
clarkbreplication for that server, wait, then add the server to the haproxy config18:03
clarkbThen probably remove an old server before adding a second new server to keep total resource consumption down18:03
clarkbThe last big questions in my head are do we want to continue to use boot from volume for the new servers?18:04
clarkbOh and whether or not the current rough sizing is still working18:04
clarkbI think the current sizing is probably fine, just a matter of sorting out how we want to supply disks18:04
*** jpena is now known as jpena|off18:05
*** jgwentworth is now known as melwitt18:33
fungiwhat's the main driver for replacement? new distro version i guess?19:00
clarkbyes that19:01
clarkbI haven't been able to find a lot of time to devote to that so I'm trying to prioritize things. I think giteas, nameservers, and ehtherpad are high on the list if anyone is able/willing to help19:02
clarkboh! we need the redirects19:21
clarkbyay for talking about this out loud :) What I'm not sure about is whether or not we need all of the other project data from the database or if we can just move the redirects over from that table in gitea19:22
clarkbalmost makes me want to handle redirects outside of gitea...19:22
clarkbI think the process might be more like deploy new server (this will create empty repos but not redirects), sort out redirects, add server to gerrit replication, add server to haproxy19:23
clarkbI'm going to set up a test node with a hold since adding redirects to a test node is almost an identical problem to solve19:23
clarkbhttps://review.opendev.org/c/opendev/system-config/+/848181 has been rechecked and a hold is in place19:25
*** dviroel is now known as dviroel|out19:26
fungimanaging redirects in gitea seems like it more gracefully handles the magic of git interactions rather than just the browseable ui (though presumably we could do both in apache configuration with some effort, but we didn't have them fronted by apache when we first started doing it)19:45
ianwclarkb: yeah this has me stumped.  i see messages in error_log, but not on stderr19:51
opendevreviewIan Wienand proposed opendev/system-config master: logrotate: don't use filename to generate config file  https://review.opendev.org/c/opendev/system-config/+/87348120:03
opendevreviewIan Wienand proposed opendev/system-config master: logrotate: remove old log workaround  https://review.opendev.org/c/opendev/system-config/+/87348220:03
*** Tengu7 is now known as Tengu20:19
clarkblooking at the gitea99 database we do have a rename in there (woo testing) which shows that I don't think we can safely copy the db from one gitea to another to copy the redirects. Instead we need to construct the local owner ids and target repo ids against the current database. Its possible that dumping the entire db and using the source from an old server would work though as the20:50
clarkbowner ids and stuff would match. I just don't know if that would introduce other incompatibilities20:50
clarkbThis has me thinking the best thing might be to dump the redirects from an existing server then write a script to lookup the corresponding values in the new server database and insert the correct rows?20:51
ianwsince linaro is all turned off; i'll merge https://review.opendev.org/c/opendev/system-config/+/871672 and then remerge those limestone, etc. patches 21:02
clarkb`SELECT user.id AS source_org_id, user.name AS source_org_name, repo_redirect.lower_name AS source_repo, repository.id AS target_repo_id, repository.lower_name AS target_repo_name FROM user INNER JOIN repo_redirect ON user.id = repo_redirect.owner_id INNER JOIN repository ON repo_redirect.redirect_repo_id = repository.id;` that seems to grab the info we need to reconstruct in another21:02
clarkbgitea install21:02
ianwalthough it's not saying they conflict21:03
clarkbbasically we need to use source_org_name to determine the new gitea's source_org_id and target_repo_name to determine target_repo_id on the new gitea. Then we can insert into repo_redirect (new_org_id, source_repo, new_target_id)21:04
clarkbinfra-root ^ do you think building it that way from either the old gitea db or our records is better than just trying to replace the gitea db on the new install with the gitea db from an old install?21:05
clarkbI'm mostly worried about things that may be different somehow through the intsallation processes. Like maybe hashes of files because they incorporate an id or something then when we change the ids around the hashes will be invalid?21:05
clarkbI kinda like the idea of building it from our records just because that was always the DR idea anyway21:06
clarkband we can cross check that against what we get from an existing install too. But I don't want to do a bunch of that effort if we think it is overkill21:06
fungiianw: thanks! i didn't do that one because i thought i remembered you having one up already21:07
clarkbone thing that will complicate this is the redirects table doesn't record a target org, just the target repo id. THis means that we may have to dedup or flatten the redirects ourselves?21:08
clarkbthe held node's table definitely seems to have flattended things21:08
ianwclarkb: i have to admit this component of gitea configuration has completely passed me by.  i'm guessing because i didn't help set it up.  we mention copying the db in the docs, this is why?21:09
ianw(system-config docs)_21:09
clarkb(on the test node we do this series of renames openstack/diskimage-builder -> opendev/diskimage-builder -> opendev/disk-image-builder, but we end up with two directs that both point at the final destination)21:09
clarkbianw: ya the db copy is purely for the redirects21:09
clarkbbut now that image handling is different I wonder if it is still safe. I guess we can just try it and see what happens too21:10
fungiianw: if you're good with 873475 i'll progressively un-wip the others as that merges/deploys21:10
clarkbwe also backup the gitea db only for the redirects. Everything else should be generateable from configs. THis makes me think investing some time in making the redirects work from config is a good idea too21:10
ianwfungi: yeah was just double checking on the pool quiestion21:11
clarkbI'm definitely not going to get around to testing either today though so we can think this through a bit21:12
clarkbI'll make note of it on the meeting agenda when I get around to writing that21:12
clarkbto generate it from our configs I think for each old/new name we query the gitea db for old_org id and then check if the new org/repo is present in repository. If so make note of the repository id and now we have enough info to insert to the redirects table. If not then skip until all renames are processed. At the end of that check our redirects and see if we've redirected multiple21:14
clarkbtimes and then flatten?21:14
clarkbshould be straightforwardish, just a matter of interacting with their db out of band and hoping they don't change the schema on us21:14
clarkbooh and we can grab a gitea db dump and check the scripting against that to ensure we don't miss anything etc21:15
*** iurygregory_ is now known as iurygregory21:22
opendevreviewIan Wienand proposed openstack/project-config master: Farewell limestone  https://review.opendev.org/c/openstack/project-config/+/87347521:22
opendevreviewIan Wienand proposed openstack/project-config master: Remove empty limestone nodepool providers  https://review.opendev.org/c/openstack/project-config/+/87347621:23
ianw^ i just added the empty pool here, as that was what i did with linaro and it worked21:23
fungioh, we do need an empty pool even though max-servers was already 0?21:24
clarkbianw: typo in there.21:24
clarkbfungi: I suspect pool maybe a required nodepool configuration21:24
fungioh, so nodepool itself breaks if no pool (even an empty one) is defined for a provider?21:24
fungii guess we don't have functional testing of those configs21:25
clarkbmaybe. I wasn't sure. I don't think we've done it before which was why I left a comment with that concern21:25
opendevreviewIan Wienand proposed openstack/project-config master: Farewell limestone  https://review.opendev.org/c/openstack/project-config/+/87347521:25
opendevreviewIan Wienand proposed openstack/project-config master: Remove empty limestone nodepool providers  https://review.opendev.org/c/openstack/project-config/+/87347621:25
ianwdoh.  i'm also not sure; i guess it passes the syntax check21:26
clarkbfor pool in self.provider.get('pools', []):21:26
clarkbso maybe it isn't required21:26
clarkbanyway doesn't hurt to have an explicit empty pool21:26
clarkbI've approved that first change21:26
clarkbthe second one should only land after the images are cleaed up21:27
clarkb(assuming that can happen gracefully)21:27
ianw++21:27
clarkbhere's another gotcha for redirects: old orgs that are no longer in our config. For example osf got renamed to openinfra21:29
clarkbthat may make it easier to just grab the db and copy that way21:30
clarkbwe could create the org via the api as an alternative without repos though21:30
clarkbthinking out loud on that a bit more maybe our ansible for gitea should already be doing that (it doesn't today but we could add it)21:31
clarkb"here are the list of orgs you need to have even if they don't show up in a repo)21:31
clarkbI'm beginning to think that we should try just copying an old db to start since that is relatively quick to try. And we can stick it on a held node too first to see if there are any glaringly obvious issues21:38
clarkbthe more I pull on the redirects the more I'm convinced doing it programmatically is possible but also that it would likely take a bit of time to get right21:39
fungiand could be fragile if they change how they store those21:49
opendevreviewMerged openstack/project-config master: Farewell limestone  https://review.opendev.org/c/openstack/project-config/+/87347522:05
opendevreviewMerged opendev/system-config master: Remove linaro-us cloud  https://review.opendev.org/c/opendev/system-config/+/87167222:15
ianwclarkb: as the official #opendev java developer ... am i reading https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.6.3/java/com/google/gerrit/pgm/util/ErrorLogFile.java#71 right ...22:33
ianw... it looks to me that it removes all logging appenders setup, and then sets up info logs to stderr, and looks for configs that then setup error_log and optionally json output?22:34
fungihe probably wishes shrews would come back and resume his post as the official java scapegoat22:34
ianwit seems to me gerrits starts up and looks at https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.6.3/resources/log4j.properties22:35
ianwand then at some early point, removes that appender setup there and puts in the new one in initLogSystem()22:36
clarkbianw: at first glance that looks right. It isn't clear to me how that text log there becomes the error_log file though22:36
ianwLOG_NAME @ https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.6.3/java/com/google/gerrit/pgm/util/ErrorLogFile.java#93 is error_log, so that's where it starts adding things to errorlog22:37
clarkbaha22:37
clarkbI guess it is possibly running that setup after you've configured the ssh loggers nuking them?22:37
ianwi don't think it changes the level, but it might explain why some things we see on the stderr output and some things in error_log22:38
clarkbit almost looks like they are twodifferent streams though and not one a superset of another due to log level22:39
clarkbbut maybe they are a super/sub set of each other and I just didn't compare accurately22:40
ianwi'm starting to wonder if public void channelInitialized(Channel channel) {22:43
ianwin https://gerrit-review.googlesource.com/c/gerrit/+/357694/6/java/com/google/gerrit/sshd/ChannelIdTrackingUnknownChannelReferenceHandler.java is actually ever being called22:43
clarkbianw: did we try a patchset where we log at error level or whatever?22:44
clarkbso that we're not relying on set-level?22:44
ianwi did, still didn't see anything22:44
ianwit doesn't help that none of the level names match up22:44
clarkbheh ya thats a really fun design choice in java land22:45
clarkbtook a bit of googling to figure out what the levels are.22:45
clarkbMy hunch if logging at error/higher level isn't showing up is that that call isn't being made22:45
clarkbianw: terrible idea but probably easier than attaching a debugger: add an assert False equivalent to the methods22:46
clarkbthen see if that raises exceptions indicating they are called22:46
ianwthat is a good idea22:47
ianwhttps://95663730722e429c421d-5cb3754a5639f7be0289e9ab84888d65.ssl.cf2.rackcdn.com/873214/10/check/system-config-run-review-3.6/290079b/review99.opendev.org/logs/error_log22:47
ianwhas22:47
ianw[2023-02-13T10:40:36.902Z] [sshd-SshDaemon[3d2be0df](port=22)-nio2-thread-1] DEBUG com.google.gerrit.sshd.CachingPublicKeyAuthenticator : authenticate(admin@ServerSessionImpl[null@/127.0.0.1:53854]) use cached result=true for ssh-ed25519 key=SHA256:qy4xoVw4bvmgIt3WABlTI3zXMMsNhphibvW5DQ0h1fo22:47
ianwi.e. it showed a debug statement from com.google.gerrit.sshd22:47
clarkboh ya that would certainly lend more evidence to the theory it isn't being called at all then22:48
ianw*but*, on review02, when i turned that up22:49
ianwi see that in the stderr output of "docker logs"!22:49
*** dasm is now known as dasm|off22:54
ianwIn order to use it, the handler instance needs to be registered as **both** an `UnknownChannelReferenceHandler` and a `ChannelListener`.22:56
ianwhttps://github.com/apache/mina-sshd/commit/11b33dee37b5b9c71a40a8a98a42007e3687131e22:57
ianwdoes https://gerrit-review.googlesource.com/c/gerrit/+/238384/9/java/com/google/gerrit/sshd/SshDaemon.java register it as both?22:57
clarkbwell we know that enabling it seems to side effect something that disabling it works around?22:59
ianwwell perhaps it's "half" enabled23:00
clarkbwhich would imply that whatever they've done in the original change was sufficient. That said I don't see it being added as a channellistener in that file23:00
ianwin https://gerrit-review.googlesource.com/c/gerrit/+/238384/9/java/com/google/gerrit/sshd/ChannelIdTrackingUnknownChannelReferenceHandler.java23:00
ianw handleUnknownChannelCommand gets called -- we know that because we saw the error23:00
ianwbut public void channelInitialized never gets called, because it's not registered as a "regular" handler?23:01
clarkbooh that could be23:01
clarkbit implements the ChannelListener interface but then isn't registered as one?23:01
clarkband you probably want to do that at initUnknownChannelReferenceHandler() because I'm not sure if it makes sense on its own23:02
ianwaddChannelListener() doesn't appear in the gerrit source code afaics23:05
clarkbI'm trying to understand how the setUnknownChannelReferenceHandler is a valid call in that context23:05
clarkbI guess SshServer may include it?23:06
clarkbah yup thats it23:06
clarkbianw: addChannelListener is also valid on SshServer so I think you can just call it in initUnknownChannelReferenceHandler() if using the channel tracking23:09
clarkbI don't think it makes sense if channel tracking is disabled so that is agood place for it ?23:09
fungiclarkb: 873476 should be ready to go now. the parent has deployed and i've checked nodepool image-list for any remaining entries indicating that provider23:09
clarkbfungi: cool do you want me to approve it?23:10
fungifeel free. i'm around, you just hadn't +2'd teh new revision23:11
clarkboh I expected my votes to carry over. Let me see23:14
clarkbthe parent changed which I guess was enough to do it?23:14
clarkbeverything else is the same. I didn't expect that.23:14
fungipart of the deletion moved into the parent change, so that removed some of the child change23:15
clarkbaha and its looking at the diffs being the same not just the end result23:15
clarkbapproved23:15
fungiit was a net zero change, but what's being depeted move out of one change an into another23:15
fungis/depeted/deleted/23:15
fungiso wasn't simply a rebase23:16
fungialso 871672 failed in deploy, presumably because of what 873428 will fix23:17
opendevreviewMerged openstack/project-config master: Remove empty limestone nodepool providers  https://review.opendev.org/c/openstack/project-config/+/87347623:25
opendevreviewIan Wienand proposed opendev/system-config master: [wip] gerrit: increase ssh channel debugging  https://review.opendev.org/c/opendev/system-config/+/87321423:31
clarkbTomorrow's the last day for service coordinator nominations. I guess I'm it again? I'll wait util after tomorrow's meeting to make it official23:37
clarkbfungi and I will be out for next week's meeting too so I'm proposing we skip it on the agenda for tomorrows 'meeting23:41
fungioh, good point23:41
clarkbianw: did the mm3 AutoField change land?23:42
ianwi don't think so.  they did merge a "fix" upstream too23:42
clarkbnot yet /me sticks that in the agenda23:42
ianwthey were specifying it in the settings.py, but we're using the one based on the docker build23:42
clarkbah so maybe we need to merge those together?23:43
clarkbthough sticking close to upstream docker is a good idea just for sanity reasons I suspect23:43
ianwi dunno.  everyone could drop it from their settings.py now that it's in the app definition23:46
clarkbbut only if they've released the fixed version and we are on the fixed version? I suspect we need to upgrade os we'll get there eventually23:46
opendevreviewMerged openstack/project-config master: Remove add_master_python3_jobs.sh  https://review.opendev.org/c/openstack/project-config/+/87301623:48
fungiianw: https://review.opendev.org/873428 is ready to go now the prior cleanup has merged23:49
clarkbok I've done a first pass at an updated agenda. Anything else we'd like to add? If zuul was still having some struggles we could go over those but they should mostly be addressed with the weekend restart23:50
fungiand that should fix the base deploy23:50
ianwfungi: is that really in merge conflict?23:51
fungioh, maybe it is since the linaro cleanup and i didn't spot it23:52
fungii can rebase that series23:52
ianwit is the linaro stuff -- i don't mind rebasing as i caused it, lmn :)23:52
opendevreviewJeremy Stanley proposed opendev/system-config master: Farewell limestone  https://review.opendev.org/c/opendev/system-config/+/87342823:57
opendevreviewJeremy Stanley proposed opendev/system-config master: Finish cleaning up packethost references  https://review.opendev.org/c/opendev/system-config/+/87342923:57
opendevreviewJeremy Stanley proposed opendev/system-config master: Final cleanup of internap/inap/iweb references  https://review.opendev.org/c/opendev/system-config/+/87343023:57
fungiianw: no problem, was easy conflict to resolve, just delete things23:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!