Monday, 2021-03-01

*** tosky has quit IRC00:11
ianwThe site uses SHA1 in signatures which is not allowed in the DEFAULT policy per Fedora 33 Change:00:44
ianwi.e. you now can't connect to the rax emergency console via firefox :/00:45
ianwi'm looking at the upgrade of the afs db servers, and it's probably a bettter idea to not do this over ssh00:45
*** LowKey has quit IRC01:28
*** LowKey has joined #opendev01:28
auristorianw: I've mentioned it before but will bring it up again.   two afs database servers do not provides redundancy for reads but no redundancy for writes as only the server with the lowest ip address can be elected the coordinator.  afsdb02.openstack.org in this case.02:43
auristorcan a third database server be added?   perhaps co-locate with afs01.ord.openstack.org if bringing up another VM is not an option.02:45
ianwauristor: i could add another.  but if i take, say, afs02 down, doesn't afs01 become the lowest ip (i.e. the only)?02:48
auristorno02:50
auristorthe ubik quorum is defined by the list of voting primary ip addresses as specified in the ubik service's CellServDB file.02:51
auristorThe server with the lowest ip address gets 1.5 votes and the others 1 vote.  To win election requires greater than 50% of the votes.  In a two server configuration there are a total of 2.5 votes to cast.   1.5 > 2.5/2 so afsdb02.openstack.org always wins regardless of what afsdb01.openstack.org says.   And afsb01.openstack.org can never win because 1 < 2.5/2.02:53
auristorby adding a third ubik server to the quorum, the total votes cast are 3.5 and it always requires the vote of two servers to elect a winner02:54
ianwauristor: thanks for the detailed info.  i'd assumed that "bos stop"-ing it would pull it, i didn't realise it was based on the config file.  i can look at adding a new server02:58
auristorif afsdb03 is added with the highest ip address, then either afsdb01 or afsdb02 can be elected02:58
ianwfor reference, it looks like the openssl on the RAX console servers is borked.  https://github.com/openssl/openssl/issues/712602:59
ianwopenssl s_client -connect ord-novnc.servers.console.rackspacecloud.com:443 -cipher DEFAULT@SECLEVEL=1 works03:00
ianwopenssl s_client -connect ord-novnc.servers.console.rackspacecloud.com:443 -cipher DEFAULT@SECLEVEL=2 does not03:00
*** cloudnull has joined #opendev03:15
*** ykarel has joined #opendev03:16
ianweverything points to the same issue as the github one.  sslprobe shows it as an affected system.  https://decoder.link/trace shows the remote site responding with  "Signature Algorithm sha1+rsa (2+1)"03:24
*** akrpan-pure has joined #opendev03:34
*** LowKey has quit IRC04:22
*** akrpan-pure has quit IRC04:34
openstackgerritIan Wienand proposed opendev/system-config master: Add afsdb03 openstack.org  https://review.opendev.org/c/opendev/system-config/+/77792404:55
*** cloudnull has quit IRC04:58
ianwinfra-root (corvus): ^ reviews appreciated but i believe we (I) want to babysit that by merging, deploying, adding dns SRV records, then restarting afsdb processes.  i have added the server to dns but not the srv records ATM04:58
*** cloudnull has joined #opendev05:00
*** akrpan-pure has joined #opendev05:20
*** cloudnull has quit IRC05:31
*** cloudnull has joined #opendev05:33
openstackgerritIan Wienand proposed opendev/system-config master: Add review01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/77792505:36
*** akrpan-pure has quit IRC05:49
*** marios has joined #opendev05:56
*** cloudnull has quit IRC06:13
*** cloudnull has joined #opendev06:14
openstackgerritIan Wienand proposed opendev/zone-opendev.org master: Update review01.opendev.org entries  https://review.opendev.org/c/opendev/zone-opendev.org/+/77792606:15
*** ralonsoh has joined #opendev06:32
*** slaweq has joined #opendev07:08
*** cloudnull has quit IRC07:10
*** whoami-rajat has joined #opendev07:11
*** cloudnull has joined #opendev07:15
*** ykarel has quit IRC07:20
*** sboyron has joined #opendev07:21
*** ykarel has joined #opendev07:26
*** eolivare has joined #opendev07:35
*** brinzhang has joined #opendev07:37
*** rpittau|afk is now known as rpittau08:04
*** cloudnull has quit IRC08:10
*** cloudnull has joined #opendev08:14
*** tosky has joined #opendev08:20
*** fressi has joined #opendev08:23
*** hemanth_n has joined #opendev08:34
*** gnuoy has quit IRC08:57
*** zbr1 has joined #opendev08:58
*** zbr has quit IRC09:01
*** zbr1 is now known as zbr09:01
*** zbr8 has joined #opendev09:06
*** ykarel is now known as ykarel|lunch09:06
*** zoharm has joined #opendev09:07
*** zbr has quit IRC09:08
*** zbr8 is now known as zbr09:08
*** brinzhang_ has joined #opendev09:15
*** brinzhang has quit IRC09:18
*** ttx has quit IRC09:20
*** ttx has joined #opendev09:21
*** hashar has joined #opendev09:21
*** zbr1 has joined #opendev09:27
*** zbr1 has quit IRC09:27
*** zbr has quit IRC09:29
*** jpenag is now known as jpena09:42
*** zbr has joined #opendev09:43
*** ykarel|lunch is now known as ykarel09:54
openstackgerritJonathan Rosser proposed opendev/system-config master: Add Debian Bullseye to the reprepro config  https://review.opendev.org/c/opendev/system-config/+/77796810:05
*** calcmandan has quit IRC10:25
*** calcmandan has joined #opendev10:25
openstackgerritJonathan Rosser proposed opendev/system-config master: Add Debian Bullseye to the reprepro config  https://review.opendev.org/c/opendev/system-config/+/77796810:27
*** fressi has left #opendev10:38
*** fbo|off is now known as fbo10:52
*** fressi has joined #opendev10:54
*** toomer has joined #opendev11:22
*** yoctozepto9 has joined #opendev12:17
*** yoctozepto has quit IRC12:17
*** yoctozepto9 is now known as yoctozepto12:17
*** lpetrut has joined #opendev12:30
*** jpena is now known as jpena|lunch12:32
*** zbr has quit IRC12:34
*** zbr has joined #opendev12:36
*** hemanth_n has quit IRC12:44
*** tbarron|out is now known as tbarron12:51
*** mkowalski has quit IRC12:59
*** mkowalski has joined #opendev13:00
openstackgerritSorin Sbârnea proposed openstack/project-config master: Add tripleo-ci-health-queries project  https://review.opendev.org/c/openstack/project-config/+/77799113:07
*** ykarel has quit IRC13:19
*** ykarel has joined #opendev13:20
*** jpena|lunch is now known as jpena13:24
openstackgerritSorin Sbârnea proposed openstack/project-config master: Add tripleo-ci-health-queries project  https://review.opendev.org/c/openstack/project-config/+/77799113:27
fungizbr: so i gather you have a repo you want to import which will use a different default branch than master. now that we have a gerrit version which can support it, we'll need to make sure that on import the default branch gets set to the correct name, and that replication to gitea respects it as well14:26
zbri am wondering if it worth the trouble, i could rename the branch on the source repo to avoid the extra effort14:27
zbrsomehow I have the impression that there lots of bugs related to using a default branch other than master.14:27
fungialso since this looks like it's going to be an official tripleo deliverable, you probably want to be sure that team is okay with having one repo with a default branch named differently than its others14:28
zbrand i am not sure yet if I can afford to pay the price to be the first14:28
*** gnuoy has joined #opendev14:29
*** ralonsoh_ has joined #opendev14:30
*** ralonsoh has quit IRC14:30
fungiwe have a high level of confidence that zuul, gerrit and gitea will behave themselves regardless of the default branch name (though you may need to tell zuul what default branch to look for in that repo), it's mostly our project creation and import scripts which are likely to need some tweaks14:36
fungibut yes, the easy workaround is to just rename that branch to master on the repo you want imported14:37
zbri already switched source to use master, for this case is better to keep consistency.14:37
*** ralonsoh_ is now known as ralonsoh14:37
zbrif it would have being a purely new project, not an utility repo, it would maybe worth the effort.14:38
zbri did hear something about git itself switching to main for new repos but I am not sure if that went out yet.14:38
fungiyeah, i've been following their mailing list. there was a proposal in december, but so far nothing which has been accepted14:39
fungithough there didn't seem to be a lot of resistance to the idea, so i expect it will happen at some point14:39
*** fressi has quit IRC14:40
zbrfunny bit is that git itself is an offensive word (not the tool)14:40
fungizbr: unrelated (i think), but we have tests passing for everything i want to see in git-review 2.0.0 now, it's all under topic:gitreview-214:41
fungiit's that one late entrant data loss fix and supporting test helpers, plus minor python packaging and docs cleanup14:42
zbri will have a look now. i was reading the updates and waiting for a ping.14:42
fungiand the python 2.9 testing14:42
zbr2.9? i hope is a typo ;)14:42
fungihah, 3.9 yes ;)14:42
fungizbr: 777848 might be good to go in first, since that takes care of a lot of random job failures due to test timeouts14:44
fungiotherwise there will probably be more rechecking to get the others to merge14:44
zbrI already +W it and another one, one done i will get the others one by one.14:50
fungithanks! once things are merged i'll push a 2.0.0.0rc1 prerelease as a test, to make sure everything's working for release publication (like i did with bindep)15:01
*** Dmitrii-Sh has quit IRC15:10
*** Dmitrii-Sh has joined #opendev15:17
*** artom has joined #opendev15:20
*** lpetrut has quit IRC15:29
corvusfungi: have we thought about granting all auth users the ability to edit hashtags?15:33
fungicorvus: yes, i think some projects were just testing them out first before we looked at setting it in the central config15:35
corvusfungi: looks like they're all -core though15:35
fungiright, concern was raised by those teams that random users might unset hashtags on changes. i think that's probably no more of a concern than other sorts of vandalism already available to them though15:36
fungii think ironic was talking about extending it to all users on theirs once they were comfortable with how it was working for core reviewers15:36
corvusyeah.  it's *really* useful for all contributors to be able to help organize changes for review.  it was invaluable for me as a non-core gerrit contributor during the gerrit hackathon.  from my pov, if someone gets over-eager and adds the "critical-bugfix" tag to a change which isn't -- that's easy to gently correct.15:38
toomerHi fungi, We got another issue pushing the code to OpenDev Gerrit, this time for the openstack/releases repository.15:39
fungitoomer: thanks, same error again?15:40
toomerWould you have some time to investigate it together ?15:40
openstackgerritJames E. Blair proposed openstack/project-config master: Allow all registered users to edit hashtags on all Zuul projects  https://review.opendev.org/c/openstack/project-config/+/77801215:40
toomerYes, let me paste the error15:40
fungii see it in the gerrit error log15:40
toomerhttp://paste.openstack.org/show/803099/15:41
fungilooks like gibi was encountering it15:41
toomerI have done git fsck - it looks good15:41
gibiyepp it is me where the commit is originating from but it goes through gerrit.noridx.org (for legal reasons)15:41
toomerI also see the missing tree: 98eedc4dae5087acbb38d5b2a0764393539ad09815:41
gibihence toomer (who is admin there) investigating15:42
fungiwhat does "goes through" mean?15:42
gibifungi: I push to gerrit.nordix.org and a jenkins job there pushes it forward to review.opendev.org15:42
fungioh, got it, so it's git run by some jenkins job which is encountering the error15:43
gibiyes15:43
toomerWe have mirrors of OpenDev repositories which are then push by our Jenkins to OpenDev15:43
gibifungi: I think this is visible for the public https://jenkins.nordix.org/job/opendev-openstack-releases-push-upstream/152/console15:43
fungithanks, yep i can try to correlate https://jenkins.nordix.org/job/opendev-openstack-releases-push-upstream/ failures to the timestamps we see in the gerrit error log too15:45
fungiany idea what version of git it's using?15:45
toomerI'm on the slave server from where the push are made ....15:45
toomergit version 2.17.115:45
toomerCan you share the stack from server ?15:46
toomerMaybe we will be able to figure something out15:46
fungisure, just a sec15:46
fungitoomer: http://paste.openstack.org/show/80310115:50
*** zoharm has quit IRC15:50
fungiand `git show 98eedc4dae5087acbb38d5b2a0764393539ad098` in that bare repo on the gerrit server's filesystem does have a tree in it15:51
toomeryes15:51
toomerhttp://paste.openstack.org/show/803099/15:51
clarkbmy suggestion last week was that maybe it is git protocol v1 that new gerrit is struggling with and it would be worth trying v215:52
fungiyep, i mean i ran git show on the gerrit server too, and it has that tree15:52
fungibut confirming it's also present on the client is good15:53
fungithanks15:53
toomerIt's on the Gerrit server as well15:56
toomerhttp://paste.openstack.org/show/803102/15:56
toomerHere are the last 3 commits on the OpenDev for the release repo15:58
toomer5318b88b7 Zuul                 Merge "Release versions for ansible-roles"15:58
toomer1fea88ecc Zuul                 Merge "ldappool 3.0.0"15:58
toomer1781bc796 Zuul                 Merge "Optional list of changes in commit message for auto release"15:58
toomerHere are the last 3 commit for the gibi change15:58
fungicool, so basically at this point we have our gerrit claiming unreachability in the push, (perhaps erroneously) faulting that tree for missing, then eventually working fine at some later time15:58
toomere22050a76 Balazs Gibizer       Wallaby Cycle Highlight for Nova15:58
toomer1fea88ecc Zuul                 Merge "ldappool 3.0.0"15:58
toomer1781bc796 Zuul                 Merge "Optional list of changes in commit message for auto release"15:58
gibifungi: yeah, this type off issue so far always resolved magically in less than 24 hours15:59
gibis/off/of15:59
fungii'm with clarkb that there's a good chance this is some incorrect optimization in the packfile getting pushed, and that using git protocol v2 might eliminate it. but debuggability for this is sorely lacking15:59
clarkbfungi: it could also be that the game of telephone here is more susceptible to this issue16:01
clarkbmaybe gibi can try pushing directly ?16:01
toomerThat options coulg give us some more info on this16:01
toomerGIT_CURL_VERBOSE=1 GIT_TRACE=116:01
fungiyeah, wouldn't hurt to export those in your job at least temporarily and try to catch another failure16:02
gibiclarkb: I can try that as a test but if that succeeds then we loose the ability to reproduce with this patch from nordix. And normally I have to push through nordix for legal reasons16:02
clarkbgibi: right, I'm just pointing out we are going through potentially 4 different igt versions here to push code and its breaking. 99.99% of all other pushes involve two16:03
clarkb(in this case its opendev gerrit jgit, nordix gerrit, your client, jenkins node client)16:03
gibiclarkb: sure. I can unblock my work by pushing directly but that is just a one time solution16:04
clarkbwell, I'm less interested from an unblocking of work and more from a debugging standpoint. It would be useful info to know if eliminating the two intermdiate git versions functions more reliably16:05
clarkbwe also learn something if it doesn't work more reliably16:05
gibifungi, toomer: if you agree then I can try the direct push16:05
gibimy local git client is16:05
gibi$ git --version16:05
gibigit version 2.30.116:05
gibiclarkb: meanwhile I was able to push through other patches via nordix, e.g. in nova.16:06
toomerI would prefer to spend some more time investigating this before we try direct push16:07
gibiack16:08
toomerHere is a GIT trace16:08
toomerhttp://paste.openstack.org/show/803103/16:08
fungii'm in a meeting for the next ~50 minutes (and another after that) so i'm less available but will try to take a look momentarily16:11
clarkb(I too have a meeting)16:12
toomerI don't so I will keep looking on this16:17
*** ykarel is now known as ykarel|away16:28
openstackgerritJames E. Blair proposed openstack/project-config master: Allow all registered users to edit hashtags on all Zuul projects  https://review.opendev.org/c/openstack/project-config/+/77801216:31
*** ykarel|away has quit IRC16:33
toomerIt looks like this type of problems are caused by optimization which causes git to send as little data as possible over the network for the uploaded change16:35
toomerBase on the https://stackoverflow.com/questions/16586642/git-unpack-error-on-push-to-gerrit there is a workaround fir this issue16:36
clarkbtoomer: that is why I suspected the git protcol previously. Since that effects how git optimizes the pushes16:37
toomer      --[no-]thin16:37
toomer           These options are passed to git-send-pack(1). A thin transfer significantly reduces the amount of sent data when the sender and receiver16:37
toomer           share many of the same objects in common. The default is --thin.16:37
clarkbtoomer: I had thought that you may be using v2 which we enabled on the server side, but 2.17.1 is v1 only. It is possible that v2 optimizes this problem properly though and may be worth trying a v2 client16:38
toomerUsing --no-thin will increase the load on the OpenDev Gerrit16:40
clarkbyes, if v2 works using that would be preferable16:40
clarkbbut you need git >=2.18 for support and 2.26 to use it by default16:40
toomerclarkb: Do you know since which git version the v2 is supported ?16:40
toomerThanks16:41
toomergit -c protocol.version=2 Will enforce the new version on git client  >=2.1816:44
clarkbyup and 2.26 or newer should use v2 by default16:44
toomerIt looks like the --no-thin option helped, but I'm not 100% sure16:46
toomerhttp://paste.openstack.org/show/803106/16:46
toomerMaybe the problem just went away as previously16:46
clarkbya hard to isolate when the problem mysteriously fixes itself16:47
fungiright, i expect you'll have to keep a potential fix in place for a while and see if the problem comes back16:48
toomerMy plan is to use --no-thin option for time being and then upgrade the jenkins slaves to Ubuntu 20.4 and git 2.25.116:48
toomerand use protocol v2 as suggested by clarkb16:49
fungitoomer: good find! and excellent info, i see the same failure occasionally in our log for other users as well, so now we know what we can perhaps suggest to them as a workaround. thanks!16:49
fungiinfra-root: ^ something to keep in mind if any other users report "missing tree" errors when pushing changes16:50
*** rpittau is now known as rpittau|afk17:00
*** marios is now known as marios|out17:07
fungizbr: d'oh! my fault, 777799 is going to need to merge before any others, since it fixes a configuration error related to the switch to the opendev tenant17:15
*** marios|out has quit IRC17:16
openstackgerritMerged openstack/project-config master: Allow all registered users to edit hashtags on all Zuul projects  https://review.opendev.org/c/openstack/project-config/+/77801217:23
*** eolivare has quit IRC17:38
openstackgerritMerged opendev/git-review master: Update jobs for opendev tenant  https://review.opendev.org/c/opendev/git-review/+/77779917:41
*** mlavalle has joined #opendev17:58
*** ralonsoh has quit IRC18:08
*** jpena is now known as jpena|off18:10
*** hashar has quit IRC18:10
openstackgerritMerged opendev/git-review master: Increase test timeout to 5 minutes  https://review.opendev.org/c/opendev/git-review/+/77784818:28
*** sboyron has quit IRC18:46
fungi#status log filed spamhaus css removal for lists.katacontainers.io ipv6 address18:47
openstackstatusfungi: finished logging18:47
fungi#status log filed spamhaus pbl removal for lists.katacontainers.io ipv4 address18:47
openstackstatusfungi: finished logging18:47
fungiianw: was there a theory as to why the gitea01 backups are still failing? i don't remember now18:54
clarkbfungi: I theorized that maybe we had restarted the mariadb container around that time, which would have killed the connection doing the dump. I think that was rueld out though and haven't heard any new theories since18:57
fungiyeah, it seems to happen daily19:09
fungibased on e-mails coming to the shared root inbox19:10
*** stand has joined #opendev19:12
*** hamalq has joined #opendev19:24
openstackgerritJeremy Stanley proposed opendev/git-review master: Remove comments for unstaged/uncommitted tests  https://review.opendev.org/c/opendev/git-review/+/77805619:30
*** toomer has quit IRC19:33
openstackgerritMerged opendev/git-review master: Add test helpers for unstaged/uncommitted changes  https://review.opendev.org/c/opendev/git-review/+/77768719:36
*** stevebaker has joined #opendev20:01
fungiwow, we're already almost caught up on node requests for the day, and only topped out at a backlog of ~400. that's a pleasant change20:16
fungiokay, digging into gitea01 backup failures, "Streaming script /etc/borg-streams/mysql failed!" even though the line immediately prior to that says "terminating with success status, rc 0"20:20
fungisomething doesn't add up20:20
fungiahh20:20
fungi"mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `action` at row: 7064"20:21
fungii should have scrolled up farther20:21
fungiit's a local mysqld though, so unlikely to be network connectivity at fault20:22
clarkbunless the process is dying or something like that20:22
clarkbmay also be a local socket vs tcp thing?20:22
fungiprocess start time for mysqld was Feb2320:23
fungiso running at least a week20:23
fungigoing to see if i can work out where the mysql container stores logs20:23
fungioh, says it's syslogging it20:25
fungialso the dump didn't get oom-killed, last oom killer event logged was almost a month ago20:26
fungii don't actually see anything logged by mariadb in the syslog20:27
fungioho! here we go... it's in /var/log/containers/docker-mariadb.log.120:28
fungi"Mar  1 05:54:50 gitea01 docker-mariadb[704]: 2021-03-01  5:54:50 192568 [Warning] Aborted connection 192568 to db: 'gitea' user: 'root' host: 'localhost' (Got an error writing communication packets)"20:28
funginot that it's all that helpful20:28
clarkbfungi: host localhost may be the local unix socket not tcp20:29
clarkbmaybe we should switch it to 127.0.0.1 to use tcp20:29
fungihttps://cloudlinux.zendesk.com/hc/en-us/articles/360010985240-MySQL-5-7-and-Got-an-error-writing-communication-packets-message20:29
fungithat suggests the query cache could be involved20:29
fungirefers to https://bugs.mysql.com/bug.php?id=8463920:30
clarkbI mention the socket thing because it is in a container and the backup runs from the host iirc ?20:30
clarkbbut I could be wrong about that20:30
fungilooking at that bug, maybe running mysqldump from within the container could help?20:31
clarkbfungi: you shuold be able to tcp from the host or ya doing the dump within the container would also get you socket access20:34
fungidoubtful it's actually the query cache bug, since mariadb docs say with 10.1.7 and later (we're using 10.4) you have to explicitly set a nonzero query_cache_size, and i can't find anywhere we're doing that20:37
fungii don't expect they would do that in the container image we're consuming either20:37
fungialso leaves me wondering why we don't hit this with, say, the gerrit backups20:39
clarkbgerrit is remote20:40
clarkbso you have to use tcp20:40
clarkb(this is why I suspect the socket :) )20:40
*** mgagne has joined #opendev20:41
ianwfungi: yeah i had a quick look yesterday at gitea backups.  it seems to isolated to failures to vexxhost20:47
ianwbacking-up to vexxhost i should say.  which suggests to me networking; because it sometimes works20:48
clarkbianw: oh it works backing up to rax?20:48
fungicould it be that the problem is not actually the local connection between the cronjob and the mariadb server, but the remote borg ssh socket dying and that's prematurely terminating the query stream?20:48
openstackgerritMathieu Gagné proposed openstack/project-config master: Enable inap-mtl01 region  https://review.opendev.org/c/openstack/project-config/+/77806420:50
ianwthe error points are suspiciously similar http://paste.openstack.org/show/BR1pVA2GsVGOAZwCk3xJ/20:50
clarkbmgagne: re ^ I take it that implies the leaky IP address problem is expected to be happier now?20:51
ianwthree days in a row it failed at "Lost connection to MySQL server during query when dumping table `action` at row: 6968"20:51
clarkbmgagne: also I want to say the old limit was ~159 not 200. I'm good with 200 if you are :)20:51
mgagnethat's what we have, assuming it's still 8 VCPUs per VM20:51
mgagneOur version does not support the option suggested by melwitt, but all compute nodes have been emptied from the previous instances so no leftover here.20:52
mgagneold limit was ~190 IIRC20:52
openstackgerritClark Boylan proposed opendev/system-config master: Add tools being used to make sense of gerrit account inconsistencies  https://review.opendev.org/c/opendev/system-config/+/77784620:53
clarkbfungi: ^ that tries to address your comments. I've run it as an unprivileged user. The next step is to run it as an admin which should get rid of the "this email has only one account" issues and then see what our dataset looks like20:54
clarkbfungi: if you get a chance to review that before we run it as adming that would be excellent20:54
clarkbmgagne: oh sorry looks like the old value we had for inap max-servers was 95. I'm happy to try at 200 and take it from there though20:55
fungisure thnig20:55
clarkbmgagne: I'll go ahead and approve the change, thank you20:55
fungiyes, huge thanks! this will help tremendously20:55
*** toomer has joined #opendev20:55
mgagnelet me know if you see any issue so we can address them right away20:56
fungiabsolutely20:56
ianwclarkb: any particular thoughts on adding afsdb03 @ https://review.opendev.org/c/opendev/system-config/+/777924 ?  it seems like a decent way to get us testing focal20:56
auristorianw: how do the DNS SRV records get updated?20:58
clarkbianw: that sounds reasonable to me. Should I approve it?20:59
clarkbauristor: that zone is hosted on a dnsaas system and we typically add ercords to it by hand20:59
auristorack20:59
ianwauristor: for openstack.org it's a manual process via a RAX hosted option20:59
clarkbianw: I +2'd it, will let you approve when ready21:00
auristorI've +121:00
*** toomer has quit IRC21:00
mgagneIIRC, we might have a thundering herd issue if Nodepool tries to create all of them at the same time, lets see21:00
ianwclarkb / auristor : thanks ... i'll do it after school run here; deploy the change, update the SRV records and restart things in quick succession21:01
*** auristor has quit IRC21:02
fungimgagne: if you do, it'll probably be the hypervisor hosts all trying to warm their image caches21:06
fungithat can overload storage networks rather quickly21:06
mgagneyes, that's what I suspect will happen21:06
mgagnemainly that Nova and/or Nodepool will give him and instance will go in ERROR state and get deleted/recreated for a while.21:07
fungimgagne: though you may be in luck, we just caught up on our node request backlog in the past few minutes so the count of nodes in use is dropping: https://grafana.opendev.org/d/5Imot6EMk/zuul-status?orgId=121:07
fungithat may naturally lead to a more gradual utilization21:08
openstackgerritMerged openstack/project-config master: Enable inap-mtl01 region  https://review.opendev.org/c/openstack/project-config/+/77806421:12
*** auristor has joined #opendev21:16
*** slittle1 has joined #opendev21:20
*** LowKey has joined #opendev21:28
*** toomer has joined #opendev21:44
*** sboyron has joined #opendev21:53
*** toomer has quit IRC22:04
*** whoami-rajat has quit IRC22:07
openstackgerritKevin Carter proposed zuul/zuul-jobs master: Make .sh browsable on swift logs  https://review.opendev.org/c/zuul/zuul-jobs/+/73179522:22
*** sboyron has quit IRC22:25
fungimgagne: looks like we started building in there a little over an hour ago, some nodes took a few minutes to come active but no errors that i see, we peaked around 60 nodes a few minutes ago: https://grafana.opendev.org/d/tazoteEGz/nodepool-inap?orgId=122:34
fungiso far so good!22:34
mgagneglad to hear it22:34
kopecmartinianw: hi, could you give me access to refstack.openstack.org (similarly like i have one to reftstack01.openstack.org), I wanna compare a few configs as a last resort22:37
fungikopecmartin: yep, i can copy your public key from reftstack01, just a moment22:38
kopecmartinfungi: thanks  .. no rush, i need to go to sleep anyway now22:38
ianwkopecmartin: ok, is there like a one line summary of where it's at?22:39
fungikopecmartin: what account are you logging into on reftstack01.openstack.org?22:39
fungiis it the "ubuntu" account maybe?22:39
kopecmartinfungi: kopecmartin account22:40
* fungi looks again22:40
kopecmartinfungi: 104.239.166.1522:40
kopecmartinthat's where refstack01 is22:40
fungikopecmartin: are you sure you're not logging into a held job node you're calling reftstack01.openstack.org?22:40
kopecmartinfungi: oh, yeah, it's a held node22:41
fungiactual refstack01 in dns is 104.239.144.25022:41
ianwyeah that's the held node from https://etherpad.opendev.org/p/refstack-docker22:41
ianwi don't see why we can't give you access to the old server if required, but we might want to do it via cfg mgmt22:43
fungii'm personally okay wit the idea of kopecmartin having ssh access to the production server for troubleshooting purposes, but we should add the account the way we...22:43
fungiyeah that22:43
fungiand get an acknowledgement of the server ssh access policy22:44
ianwkopecmartin: just grep "extra_users" in system config and follow along if that's what you'd like to do; of course happy to grab anything required too22:46
fungiclarkb: when you get back, see comments on 777846, i expect it's not doing quite what you'll want there22:52
openstackgerritJames E. Blair proposed ttygroup/gertty master: Add support for searching for hashtags  https://review.opendev.org/c/ttygroup/gertty/+/77808822:57
openstackgerritMartin Kopec proposed opendev/system-config master: SSH access to refstack for kopecmartin  https://review.opendev.org/c/opendev/system-config/+/77809023:04
kopecmartinianw: fungi is that ok^^23:05
ianwkopecmartin: that won't deploy on the old server, which is what you really want right?23:06
kopecmartinianw: yes23:06
ianwyeah, have to look but refstack-docker.yaml is the var file for the new server23:06
ianwkopecmartin: also good just to confirm in the changelog you've seen https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#ssh-access23:07
*** slaweq has quit IRC23:07
clarkbfungi: yes, I mentioned this last week. Its a new more recently than. It shouldn't accidentally identify accounts that were used recently as non recent, but may identify non recent accounts as recently used23:08
clarkbfungi: I was thinking we'd start here for now to whittle the list down, then get smarter with the smaller list23:08
fungiahh, yeah got it. if we wanted it to be more accurate, for loop over the returned results (maybe also with pagination handling), then iterate through the patchsets and comments for each result23:09
clarkbyup, I think  that may be annoyingly slow if we try to do it for the full set now23:09
clarkbbut if we cut the list in half or something like that then doing that to further reduce the list down makes sense to me23:10
ianwkopecmartin: i guess the "refstack" group doesn't actually match the old server, weirdly23:10
clarkbfungi: do you agree it should "fail" in the safe manner I assert? and if so any objections to me running that as an admin nowish?23:12
fungiyeah, it will potentially consider stale accounts as recently used, but that's safe23:12
kopecmartinianw: ah, let's forget that, I don't need it, I've found a workaround which will solve the problem (ProxyPass in the vhost config) with the missing api part of URLs, i'll propose a patch tomorrow23:13
fungiclarkb: also it's not all that slow. i mean the openstack election tooling looks up recent commit activity for a single account in a few seconds23:13
fungiand generates entirely electoral rolls of >1k accounts based on a year of change activity in maybe half an hour23:14
clarkbfungi: I may be missing something, but I don't think there is a way to ask for the most recent comment directly? so you have to lookup all the recent changes and scan them?23:14
clarkbthe current script takes about 10 minutes to run just being simple23:14
ianwkopecmartin: ok, feel free to follow-up and we can make sure it installs your user on the old and new hosts if you want23:15
fungiclarkb: yep, that's basically what i've done elsewhere. for loop over the changes returned, tell it to include all patchsets and comments, then for loop over the patchsets and comments23:15
fungibut as you say, no need to complicate it for now23:15
clarkbalright I'll go ahead and run it as admin in a minute23:18
openstackgerritMerged openstack/project-config master: intel-nfv-ci-tests: move zuul definition into repo  https://review.opendev.org/c/openstack/project-config/+/77751123:24
clarkbI should've given this a progress bar23:35
fungianother thing i've found after writing my third or fourth gerrit data scraper... the trick is to request as much data as you can in the fewest number of queries possible, and then do the rest with client-side filtering/iteration23:38
fungiindividual requests to the gerrit api incur a ton of overhead23:38
clarkbya in this case I think the number of requests is minimal. It gets the active then inactive accounts. THen checks for ownership and reviews done by them23:39
clarkbI guess I could query more accounts at once23:40
funginot easily with the way you're doing it now though23:40
fungisince you're taking advantage of assuming the first returned result is for the one account you're asking about23:40
clarkbhrm ya I think you'd have to lookup account details in all cases for each of the erturns to get that info anyway23:41
clarkbanyway its done running23:42
clarkbI'll put the results on review so others can check them too23:42
clarkbinfra-root review:~clarkb/gerrit_user_cleanups/external_id_conflict_classifications has the results of that script run23:46
ianw13901,13902,13903,13905,13906 ... that's a series23:49
openstackgerritMerged opendev/system-config master: Add afsdb03 openstack.org  https://review.opendev.org/c/opendev/system-config/+/77792423:53
clarkbianw: ya there are definitely a few in here where you could tell peopel were struggling23:55
fungisome people will clearly try a lot of different ways to get their account to work (or maybe the same way over and over)23:56
*** gibi has quit IRC23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!