Thursday, 2023-04-06

opendevreviewMerged opendev/system-config master: Add static02 to inventory  https://review.opendev.org/c/opendev/system-config/+/87938300:24
opendevreviewMerged opendev/system-config master: install-launch-node: upgrade all packages  https://review.opendev.org/c/opendev/system-config/+/87971200:24
*** Trevor is now known as Guest1012401:22
ianwahh, i guess vos release needs an admin key, not the wheel volume key01:48
opendevreviewIan Wienand proposed openstack/project-config master: wheel builds : move to individual releases  https://review.opendev.org/c/openstack/project-config/+/87972204:03
opendevreviewIan Wienand proposed openstack/project-config master: wheel builds : move to individual releases  https://review.opendev.org/c/openstack/project-config/+/87972204:11
opendevreviewIan Wienand proposed openstack/project-config master: wheel builds : move to individual releases  https://review.opendev.org/c/openstack/project-config/+/87972204:18
noonedeadpunkclarkb: ianw it will help, as upgrading existing systems does not break07:30
noonedeadpunkcentos9 I mean07:30
noonedeadpunkas you already have gpg imported to rpm, and gnupg update breaks adding new gpgs not existing ones07:30
noonedeadpunkbut yeah, given it's already built...07:31
fricklerwe could still delete the latest build if that is showing failures and revert to the previous one07:54
apevec<noonedeadpunk> "centos9 I mean" <- -3 build was reverted in the latest CS9 compose, now it just needs to hit the mirrors https://composes.stream.centos.org/production/CentOS-Stream-9-20230405.1/compose/BaseOS/source/tree/Packages/gnupg2-2.3.3-2.el9.src.rpm11:23
mnasiadkaHello, I'm seeing some POST_FAILUREs (e.g. https://zuul.opendev.org/t/openstack/build/67162651408647bcb7f81b5cab808e2e) - maybe some issues with log uploads?11:38
noonedeadpunkapevec: aha, good news I assume11:39
noonedeadpunkapevec: another issue I saw, is that multiple SIGs while updating their GPG keys on https://www.centos.org/keys more then a year ago are still packaging onld ones NFV was fixed really fast yestarday, but now I've spotted Storage with exact same thing11:40
apevecI'm now asking around when we can expect this on mirrors, then we need our AFS refreshed, which  goes via non-primary CS9 mirror ...11:41
apevecamoralej:  ^ who is the best contact for Storage SIG these days?11:42
noonedeadpunkfrickler: well, at least we had short period during the night when jobs were passing, until images got updated. so for osa at least that would be helpful. but I'm not sure it will fix rest11:42
noonedeadpunkI was told it's ndevos as chair, but not sure how to reach them, except emailing directly11:43
fricklermnasiadka: I checked the latest POST_FAILUREs, only that single one has no logs, so I think you can just recheck.12:06
fungi2023-04-06 10:20:10,289 DEBUG zuul.AnsibleJob.output: [e: 1aeec5b0e51048b4bd5d6cec82ba92d0] [build: 67162651408647bcb7f81b5cab808e2e] Ansible output: b'TASK [upload-logs-swift : Upload logs to swift] ********************************'12:08
fungi2023-04-06 10:20:21,337 DEBUG zuul.AnsibleJob.output: [e: 1aeec5b0e51048b4bd5d6cec82ba92d0] [build: 67162651408647bcb7f81b5cab808e2e] Ansible output: b'fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that \'no_log: true\' was specified for this result", "changed": false}'12:08
fungiso yeah, log upload failed, but we don't know why12:08
fungii can see from the logs that it wanted to upload to the swift endpoint in ovh-gra112:09
fungii guess if we see more of those, we should check to see if they're all uploading to the same provider/region12:10
apevec<noonedeadpunk> "apevec: another issue I saw..." <- do you have a link to centos-devel discussion where was NFV key updated?12:20
noonedeadpunkapevec: I have no idea where logs are for this. It was yestarday between 16 and 17 UTC12:21
fricklerapevec: replying to messages isn't really a feature in IRC and looks weird in native clients. I would appreciate if you could stop using that while being bridged here12:37
apevecsorry about that, will take notice!12:38
apevecBTW was move to native Matrix considered for opendev channels?12:39
fricklerthere was some discussion some time ago, but quite a bit of opposition to that, too, including myself12:40
*** blarnath is now known as d34dh0r5313:15
genekuoianw  Sure, I'll try to wake up and shadow the process13:53
genekuoclarkb: having a short talk will be great to get more understanding on what tasks infra team is working on and skills needed.13:55
genekuo I'm ok with late evening call which will probably work better for you.13:55
opendevreviewgustavo ornaghi antunes proposed openstack/project-config master: Add Dell Storage App to StarlingX  https://review.opendev.org/c/openstack/project-config/+/87974414:36
opendevreviewgustavo ornaghi antunes proposed openstack/project-config master: Add Dell Storage App to StarlingX  https://review.opendev.org/c/openstack/project-config/+/87974414:45
noonedeadpunkthe most thing I hate about IRC is that you need to maintain bouncer not to loose history, which needs slightly more effort then you want to usually put14:48
opendevreviewgustavo ornaghi antunes proposed openstack/project-config master: Add Dell Storage App to StarlingX  https://review.opendev.org/c/openstack/project-config/+/87974414:56
clarkbgenekuo: ok cool lets try to schedule something for next week? I think fungi won't be around but that should be fine15:12
fungiyeah, there's no need to include me, but if there's one when i'm not on vacation i'll be happy to help15:26
genekuoclarkb sounds good to me. For late evening, Monday and Wednesday will be best for my time.15:32
genekuoSo Monday, Wednesday morning in your time zone15:32
genekuoTuesday and Thursday will also works if it's better for you 15:33
clarkbgenekuo: Wednesday would be perfect for me I think.15:33
fricklerwhat's that in UTC? mon I'm out, wed I could likely join, too15:33
clarkbfrickler: genekuo I think I could start as early as 1300 UTC Wednesday. 1400 would be better but I'll live15:34
genekuo1400 UTC works for me15:34
frickler14 avoids conflict with kolla meeting, that'd be fine for me, too15:35
clarkbcool see you at 1400 UTC wednesday. I can share a meetpad link for that as we get closer15:35
genekuocool15:38
clarkbI've spot checked static02.opendev.org hosting docs.openstack.org, static.opendev.org, and tarballs.opendev.org via local /etc/hosts overrides. There are a number of things hosted there so will take some time to double check everything (probably won't finish today but hopefully can get that done tomrrow)17:46
clarkbOh I meant to say the spot checks look good17:47
clarkbwe also need to ensure everything CNAME'd to static01 has done so via the static CNAME static01 CNAME and not directly17:47
clarkbbut I think I can check that as I check the hosting works since its all related to digging info out of dns17:47
fungiwe can/should go ahead and merge https://review.opendev.org/879414 ahead of the maintenance, right?17:49
clarkbyes, historically I think we've eft that for when we are done. But we can always revert later instead if necessary17:50
clarkbok looks like the ssl cert list for static will be a good cheatsheat for content17:50
fungiall the changes for topic:gerrit-3.7 lgtm, and i approved the associated git-review series17:50
clarkbfungi: git-review series?17:51
opendevreviewMerged opendev/project-config master: Add renames for April 6th outage  https://review.opendev.org/c/opendev/project-config/+/87941417:52
fungitopic:gerrit-3.7+project:git-review17:52
fungimostly your stuff around testing git-review with newer gerrit17:52
fungier, topic:gerrit-3.7+project:opendev/git-review17:52
clarkbah17:52
fungialso the tox to nox switch, which it depended on17:53
fungifigure that's been up plenty long enough for anyone who cares to object17:53
clarkbthese records will not update when we update the static.opendev.org CNAME static02.opendev.org record: devstack.org registry.zuul-ci.org zuul-ci.org www.zuul-ci.org zuulci.org www.zuulci.org gating.dev www.gating.dev17:58
clarkbsome of those are root records and this is expected. Others should probably be updated to point at the cname. I'll work on changes for that as part of the dns update and we can do a staged shutdown of the old server once we think we've got everything to avoid accidents17:58
clarkbfungi: is devstack.org managed through the rax dns stuff?17:59
fungiyes17:59
fungii expect www.zuul-ci.org, www.zuulci.org and www.gating.dev are already cnames, just not (directly) to static0118:00
fungiand i expected wrong!18:00
fungithey're all a/aaaa18:00
clarkbyup I'll update those that can be CNAMEs to CNAMEs to cut down on needing to do extra work the next time this is done18:00
clarkband update the A/AAAA records for those that can't be CNAMEs18:01
fungithanks18:01
clarkbInitially I wasn't going to bother with a short ttl on the static.opendev.org CNAME but now I'm thinking that may be a good idea in case I miss something and need to revert.18:03
fungiin which case i guess we need a followup change to re-default the ttl(s)?18:04
clarkbyes18:04
clarkbnot a big deal I just thought i could get away without doing that18:04
opendevreviewMerged opendev/git-review master: Switch from tox to nox  https://review.opendev.org/c/opendev/git-review/+/87165218:11
opendevreviewMerged opendev/git-review master: Test Python bounds only  https://review.opendev.org/c/opendev/git-review/+/87732118:11
opendevreviewMerged opendev/git-review master: Test old and new Gerrit  https://review.opendev.org/c/opendev/git-review/+/87731318:11
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Update static.o.o CNAME to point at static02  https://review.opendev.org/c/opendev/zone-opendev.org/+/87978018:11
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Remove old static01 records  https://review.opendev.org/c/opendev/zone-opendev.org/+/87978118:11
clarkbI don't think we want to land any of those today just to avoid debugging new static overlapping with gerrit things18:12
clarkbbut I'll get changes up anyway18:12
fungisure18:19
opendevreviewClark Boylan proposed opendev/zone-zuul-ci.org master: Update zuul dns records to the new static02 server  https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/87978218:21
opendevreviewClark Boylan proposed opendev/zone-zuul-ci.org master: Revert short @ record TTLs  https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/87978318:21
clarkbremote:   https://review.opendev.org/c/opendev/zone-gating.dev/+/879784 Point gating.dev at the new static02 server18:26
clarkbthat one doesn't report to us apparnetly18:26
clarkbthats all of the code review based updates necessary to do the swithc. devstack.org I'll have to do by hand as this is in process18:26
clarkbalso I used the ssl cert generation config to determine which names to look at. I suspect that is fairly complete18:26
fungiour certcheck is based on le config now, so good call yeah18:55
fungiclarkb: minor question on 879780 but i'm not really all that worried about the ordering19:00
fungiso +2 anyway19:00
Clark[m]Ya short answer while I sort out lunch is when I looked at the follow-up change to remove static01 records I realized I didn't really need to move it as much as not delete it19:03
fungiwfm19:57
fungiwe're at t minus one hour until maintenance21:00
fungishould we status notice a reminder and maybe start putting things into disable state?21:00
clarkbsounds good.21:00
clarkblet me update the list of emergency file hosts on the etherpad to include those that the rename playbook touches21:01
fungistatus notice The Gerrit service on review.opendev.org will be offline for extended periods between 22:00 and 23:00 UTC for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/21:02
fungioh, we said two hours, so between 22:00 and 00:0021:02
fungiotherwise lgty?21:02
clarkbya due to the rename too.21:02
clarkbyes lgtm21:02
clarkbhttps://etherpad.opendev.org/p/gerrit-upgrade-3.7 is the etherpad if you want to check the list of hosts21:03
fungi#status notice The Gerrit service on review.opendev.org will be offline for extended periods between 22:00 and 00:00 UTC for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/21:03
opendevstatusfungi: sending notice21:03
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline for extended periods between 22:00 and 00:00 UTC for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/21:03
fungiclarkb: list of hosts lgtm, thanks21:05
clarkbfungi: I'll go ahead and add them on bridge now then21:05
fungigo for it21:06
opendevstatusfungi: finished sending notice21:06
clarkbfungi: thats done if you want ot double check it on bridge too21:08
fungican do21:11
fungiclarkb: looks correct on bridge. thanks!21:14
fungii guess we're all set for about the next 30-40 minutes21:15
clarkbI think so21:15
fungifor the second status announcement, do we want to #status alert or just stick with notice?21:17
clarkbI feel like notice is sufficient?21:18
fungii'm cool with it21:19
ianwo/ thanks for sending the notice21:20
fungiwanted to make sure you're able to wake up and enjoy your tea et cetera21:21
fungithere are 3 detached root screen sessions on bridge. we should maybe clean them up. newest is from a month ago21:22
fungiany objections?21:22
clarkbno objections from me. I have a screen I was using for the static and etherpad bootstrapping but it should be owned by my user not root21:23
funginewest one looks like it was related to docker updates21:23
fungipretty sure that can go21:23
fungione from january was a zuul rolling restart21:24
ianwsounds good21:24
fungione from december was a zuul restart as well21:24
fungii'll close out all three21:24
fungistep #3 in the pad is creating a root screen session on review02 right?21:25
ianwlook like the launch-env on bridge updated as we'd hoped now21:25
ianwfungi: was just starting that21:26
fungicool, thanks!21:26
clarkbianw: I've made a suggested edit to the reindex step. Looking at the 3.7 release notes and the upgrade scipting for our test jobs it does a full reindex of everything not just changes21:26
fungii've joined it. was the only one owned by root21:26
clarkb(changes is the slow one so its not like this will make it take much longer just makes it more complete and is safer I think)21:26
ianwoh, thanks!  yeah that's a copy-paste, i intended that to be everything21:27
fungisgtm, and yeah the account index rebuilds quickly21:27
clarkband a note on step 15 (basically it should be a noop which is totally fine)21:28
clarkbI think worth keeping in the doc so that we have it for future upgrades if we refer back to this one21:29
ianwyep; i think the checklist gets better each time as we pull things from the last one21:31
fungiianw: we went ahead and did up to rename step 2 earlier as well21:33
fungior i should say through rename step #221:34
ianwoh i should pull the latest change on bridge actually21:35
ianwand that playbook should run in a screen as well21:35
ianwok HEAD of that now 854a22aeae1bce7cb6ad63579af87fa1aa95656621:36
clarkbthe suggestion for tea has inspired me /me boils a kettle21:37
fungii'm mowing the lawn, which seems like an odd thing to be doing just before a maintenance window, but i'll do my best not to initiate a medical emergency in the next few minutes21:38
fungistatus notice The Gerrit service on review.opendev.org will be offline for extended periods over the next two hours for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/21:50
fungithat look good for sending in about 10 minutes?21:50
clarkbwfm21:50
ianw++21:50
fungicool, i'll record it in the pad and stage to send it just before the top of the hour21:51
fungii've started a root screen session on bridge01 and turned on session logging for it like was done for review02 in step #321:56
fungisince i didn't see one yet (though we don't need it until the rename work starts)21:56
clarkbok I guess we're operating from both hosts so have to join them separately21:56
* clarkb organizes workspaces21:56
fungiyeah, upgrade happening in one, rename in the other21:56
fungiwe're at t minus two minutes, time to send the notice?21:57
clarkb++21:57
fungi#status notice The Gerrit service on review.opendev.org will be offline for extended periods over the next two hours for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/21:58
opendevstatusfungi: sending notice21:58
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline for extended periods over the next two hours for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/21:58
fungitakes about that long to hit all the channels21:58
ianwdoes the twitter update still work or did they can that?22:00
clarkbthey made changes to the api22:00
clarkbvery likely we were impacted22:00
opendevstatusfungi: finished sending notice22:00
fungibut we do send it to mastodon still, right?22:00
ianwi guess it's still updating.  i have a change to pull it out somewhere if it breaks other things but as long as it isn't i guess we can keep it22:01
ianwanyway, i think we can start?22:01
clarkbyes I'm ready if you are22:01
clarkbI have a cup of warm tea and a wool shirt on this cold and rainy day.22:01
fungiyep22:01
fungistep #4 here we come22:01
clarkbianw: did you double check the emergenc file update?22:02
clarkbjust to be sure I got the names correct etc22:02
clarkbianw: you have to use `up -d mariadb`22:02
clarkbthe down stops and deletes the containers so they aren't startable. Up is create + start22:02
clarkbor that should work too :)22:03
fungiit's 80f/27c here and uncharacteristically calm winds, so not a great day to mow the lawn, but it's supposed to rain all weekend and i'm getting on a plane monday :/22:03
clarkba week ago the forecast made it look like spring was starting this weekend. ~72F Sunday. Then a few days later spring was cancelled.22:04
fungiit's also very sunny. and ~100% humidity but that's to be expected here22:05
ianwok, watched the backup logs, all good22:05
fungiyep, lgtm22:05
fungiand gerrit is offline22:05
fungiwe have a lot of container images on there22:08
clarkbfungi: ya I did a cleanup of images older than like a yera or something a while back22:08
clarkbbut we're somewhat careful with gerrit and I think that is a good thing especially since it supports reverts22:09
ianwac4763fec95aab55deafe2e1e48f0e166fb7ff59561df82cf87da15c52775b15 confirmed22:09
fungisure22:09
fungiso offline reindex time?22:09
clarkbnot sure if we automated the gerrit pruning though probably should if we havent22:10
fungithere it goes22:10
fungiand now we wait22:11
fungii'm going to do a few more laps with the mower while this spins22:11
clarkbya we might consider overriding the cdefault cpu count for this in the future its like 1/2 or 1/4 of total cpus iirc22:11
fungicould probably crank it up to nearly 1:122:11
fungibut it's mostly a few large slices which have to complete, so at some point more parallelism doesn't buy us much22:12
clarkbexactly22:12
ianwi feel like we ran stats on that with the big notedb upgrade , when we had a complete mirror we were testing on22:13
fungithat certainly sounds like us22:13
clarkbianw: hrm I'm looking at the release notes and isays we need to run the init step too but that doesn't appear to be on the etherpad?22:15
clarkbhttps://www.gerritcodereview.com/3.7.html#offline-upgrade22:15
clarkbI think we've normally run that in previous upgrades so that should be captured somewhere in a previous doc?22:16
ianwhrm, yes i agree it does say that, and we haven't done that22:16
clarkbwhat I'm not sure about is if we need to index after init'ing or if order matters at all22:16
ianwhttps://etherpad.opendev.org/p/gerrit-upgrade-3.6 we didn't22:17
ianwnor22:17
ianwhttps://etherpad.opendev.org/p/gerrit-upgrade-3.522:17
genekuojust wondering, isn't the etherpad steps reviewed before the execution, or it's just some bit that is missed in review22:17
ianwwe've just missed this one in review22:18
fungigenekuo: yeah, we review the pad, also it's based (not too loosely) on prior testing22:18
clarkbianw: we did https://etherpad.opendev.org/p/gerrit-upgrade-3.322:18
genekuoI see22:19
fungibut also, it's got a lot of stuff in it, so easy to miss something22:19
clarkbline 4622:19
fungiall the more reason why more eyes are helpful ;)22:19
clarkbpart of the problem too is the gerrit upgrade process chagnes almost every upgrade22:19
ianwi think for an abundance of caution, we should run init after this reindex, then re-run the reindex22:19
clarkbianw: ++22:20
fungiyeah, we didn't need to reindex for 3.3, but the notes suggest that's the order we would have gone with22:20
fungithis is why we budget extra time ;)22:21
ianw++ i agree on running the same init command as in the revert docs22:23
clarkbianw: I updated the etherpad to capture this command. I pulled it from the 3.3 upgrade but it also matches what we have in the system-config ansible stuff22:23
fungithanks!22:24
ianwhrm, what was that exception22:25
ianwLoading commit AnyObjectId[9cd80009587b67757b34a278063ab98c56a06316] for ps 3 of change 19316 failed.22:25
clarkbits angry about a sub 100k change unfortunately we've got a few of those in the installation iirc22:25
ianwerror getting field added of ChangeData{Change{19321 (Id26497a7655c69b367aeee959d0078495879b1cf), dest=x/kwapi,refs/heads/master, status=M}}22:25
clarkbbasically over the years the gerrit data migrations weren't all as reliable as hoped for22:25
clarkband we've ended up with a small number ofcorrupted changes. I suspect because we may have manipulated them by hand in the DB for one reason or another22:26
clarkbdid the index progress halt or is that an artifact of screen copy mode scrollback?22:27
clarkbtop output implies things haven't halted22:27
clarkb1061.7s self reported time22:29
ianwit's finished, i'll do the init22:29
fungilog should indicate whether it completed22:29
fungibut yeah, i think we're good22:30
* fungi actually doesn't remember whether offline reindexes report into the error_log like online ones do22:30
clarkbah ok this migration is for our submit requirements stuff that should be a noop. We probably want to spot check say all-projects and a handful of others to ensure it was a noop22:31
clarkbI strongly suspect we didn't actually need to migrate other than for bookkeeping purposes sincewe took care of this explicitly upfront22:31
fungialso hopefully the first reindex primed the file cache enough to speed up the next run of it22:31
fungiseems like it started up faster at least22:33
fungia little over 25% of the time we booked, so i feel like we're in a pretty comfortable spot still22:36
ianw80%22:44
clarkbsame error again I expect it was even around 86% so the order it processes these appears to be deterministic22:45
clarkbit was 30s faster doing the changes index. Not much22:48
fungiwoohoo!22:48
clarkb17:42 wall total22:48
fungi30 seconds i can spend doing something pointless22:48
fungiand gerrit should be on its way up (but this is not the last outage)22:49
ianwhrm, what's with the replication errors22:50
clarkbianw: I understand it I think its ok22:50
clarkbor at least I Think I do22:50
clarkbit has to do with keeping state on disk but new containers wipe that out? I had chagnes to improve that but maybe we have a bug around it22:50
ianwoh that was mounting another dir right, i thought we merged that22:51
clarkboh ya we did22:51
fungioh, this is the not losing replication events when we restart, right22:51
clarkbya but maybe there is something wrong wiht it since it is so persistent. I would've expected it to flush and then move on22:51
ianwgerrit2@review02:~/review_site/data/replication/ref-updates/waiting$ ls -l | wc -l22:52
ianw971722:52
clarkbbut they appear to be unique tasks22:52
fungithat's a couple of orders of magnitude more than i would have expected22:52
clarkbso not retrying over and over again (which is good) and it stopped22:52
ianwappears to ahve stopped22:53
ianwheh, yeah22:53
fungiunless gerrit re-replicates everything on start22:53
clarkbI think we check if new replication tasks while gerrit is running are happy22:53
fungibut the state tracking should mean it wouldn't need to do that, so i expect that's not what created them22:53
ianwError while renaming task 02744617a81f6f87e68186e01e9d46f29c6033b7 [CONTEXT pushOneId="6fa17c8f" ]22:53
ianwjava.nio.file.NoSuchFileException: /var/gerrit/data/replication/ref-updates/waiting/02744617a81f6f87e68186e01e9d46f29c6033b7 -> /var/gerrit/data/replication/ref-updates/running/02744617a81f6f87e68186e01e9d46f29c6033b722:53
ianwso it's trying to put a waiting into running, but can't find the "waiting" file?22:54
clarkbya it sticks task records in the waiting queue then when it goes to process them it moves them. It might also be a race between different threads?22:54
clarkbianw: yes exactly22:54
clarkbI think this is possibly related to the migration which has me concerned the migration actually generated meta config diffs22:54
ianwwaiting dir still has 9717 entries, so that's not moving22:55
clarkbI think two things should be checked 1) does replication occur for new events while gerrit is up and running now 2) what does meta config look like for $projects22:55
ianwok well the ui is up, that's one ting22:56
clarkbianw: if you look at the timestamps in waiting they are older than I expected. This might be something that we want to clear out between upgrades?22:57
clarkbbasically these aren't a bunch of new replication tasks from the upgrade process based on timestamps22:57
ianwi just put https://review.opendev.org/879722 in for a recheck and zuul has picked that up22:57
ianw-rw------- 1 gerrit2 gerrit2  139 Mar  2 21:17 1f9f0629cc0e76c48676ece942d7b0099ca2e25322:58
ianwyeah, they start from a long time ago...22:58
clarkbwe should push a new change/newps and see if it replicates. I suspect the replication stuff is stale and we can ignore it for now if replication for current stuff is working22:58
clarkband then figure out why those are leaking and how to clean them up22:58
clarkbthey do not show in show-queue22:58
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing gerrit replication  https://review.opendev.org/c/opendev/system-config/+/87979022:59
ianwhttps://review.opendev.org/c/zuul/zuul-client/+/879520 is a pretty simple one if we ant to merge something quick23:00
fungiwell, new changes/patchsets get replicated too23:00
fungishouldn't have to merge anything23:00
clarkbya 879790 looks good in the replication log23:00
clarkbnow to fetch it from one of the giteas23:00
clarkb`git fetch https://opendev.org/opendev/system-config refs/changes/90/879790/1` worked for me23:01
ianw++ agree23:02
clarkbso I think replication is mostly working and whatever the issue is with stale replication events is separte. We should fix that but it can happen later23:02
clarkbI think those files are meant to be json with info in them or similar serialize data we might be able to work from toe debug further23:02
fungisgtm, so we can proceed i guess23:04
clarkbthey are all replication events for all-projects23:05
clarkbor at least after sampling two of them (they are json data)23:05
clarkbwe don't/can't replicate all projects because we don't let the system have permission to do so23:05
clarkbits possible this is a bug in the plugin recording those but not removing them when it finds it doesn't have permissions23:05
fungioh, yep!23:06
clarkbI was able to push a change, review the change, and verify the change is replicated. I think that all lgtm23:06
fungiso dealing with 879412 is next if everyone's ready to proceed?23:08
ianwomg my keyboard just went bananas and wouldn't stop sending "enter"23:08
fungii saw, it was epic23:08
ianwi had to rip the laptop out of the thunderbolt dock to stop it23:09
clarkbwere we happy with the diffs? was it just email templates23:09
clarkboh wow23:09
ianwsorry, back now after heart attack that gerrrit was expoding somehow :)23:09
fungino worries23:09
clarkband before we proceed do we want ot check the all-projects meta config history and or another project or two?23:09
clarkbjust to see that the migration nooped as expected?23:10
ianwthere does seem to be some diff in the gitea config section23:10
clarkbianw: for replication or commentlinks?23:10
ianwjust unquoting23:10
clarkbah ok that should be safe for now23:10
fungibut just the templates right?23:10
fungioh, there23:10
clarkbfungi: the templates are purely for email23:11
fungiyeah23:11
ianwhttps://paste.opendev.org/show/bLQqwsnIoMIJOVLLAdAX/23:11
ianwit looks safe, i wonder if this isn't in the system-config test?23:11
fungimostly quoting23:11
clarkbianw: ya I think thats ok delta we can fix that later since it doesn't appear to be semantic23:11
ianwi'll add a note but it looks ok to me too23:12
fungiit's all quoting differences, yeah23:12
ianwok, i think we're about ready to call it on the gerrit screen, and things are in a stable state?23:14
clarkbas far as I can tell they are. My test cahnge even got a +1 from zuul already23:14
clarkbmy only outstanding question is the refs/meta/config after the migration step23:14
clarkbbut voting and submit requirements also appear to be working as I expect so I don't think anything bad happened there23:15
ianwoh right, if it merged anything?23:15
fungigit log in /home/gerrit2/review_site/git/All-Projects.git looks okay23:15
clarkbyes basically did it create chagnes for refs/meta/config even though we did our best to prevent that from happening23:16
clarkbfungi: what branch is that looking at?23:16
fungi"Migrate label configs to copy conditions"23:16
clarkbfungi: is that from now or is that from when ianw did it?23:16
fungi8803e0f011d8892a5230fc3cd262b78481a8039b23:16
fungiit's from 22:30z23:16
clarkbok so it did make a difference?23:16
fungilooks like it23:17
clarkbif that is safe to share can you share the diff?23:17
clarkbor the full git show of that ref 23:17
ianwyeah23:17
ianwhttps://paste.opendev.org/show/biW2eeakwF2arkF0Gp3Y/23:17
fungihttps://paste.opendev.org/show/bagB8WhAD3XNJdkcIvf3/23:17
clarkbok I think that is wrong for verified and workflow23:18
clarkbbut ok for code review23:18
ianwit's added NO_CHANGE23:19
clarkbright and workflow in particular shouldn't persist that23:19
clarkbsince its about triggering events and making zuul take action etc23:19
fungilooks like it really wants changekind:NO_CHANGE everywhere23:19
clarkbIt might be ok for verified but I'd have to think about it23:19
clarkbbut I also don't think fixing that is urgent23:19
fungiwe probably want to clean it up for verified23:19
clarkbdid it make any other changes to all-projects today or just that one?23:19
fungithat was the only one in the history23:20
fungiother than our changes23:20
ianwyep prior one was "Fix boolean operators to all-caps"23:20
clarkbok I think I'm comfortable proceeding for now and coming back to cleaning that up23:20
fungiagreed23:20
clarkbit makes me wonder if it did that to all the projects as well but I think those matter less than verified and workflow so even less urgent if it did23:20
fungiright23:21
ianwwe could remove the acl cache again and re-do it23:21
clarkbya or maybe we decide NO_CHANGE is papropriate for some things and just update config on our side23:21
clarkbbut ya I think this is all ok for now if not ideal23:21
clarkbI'm happy for people to disagree with me too :) just want to communicate my comfort level with proceeding23:22
ianwNO_CHANGE is more trivial than a trivial rebase, no code change and a first parent update, hence this change kind is also matched by changekind:TRIVIAL_REBASE23:24
ianwi think i thought about this, now i look at it23:24
ianwTRIVIAL_REBASE includes NO_CHANGE23:24
clarkbianw: ya I think its fine on code-review (even if it appears to be a noop there) but for verified and workflow we expect those to reset state with new patchsets as they are used to drive state machine state23:24
clarkbbasically we never wanted to copy verified or workflow23:25
fungibut we can quickly merge a change to address that after the window23:25
clarkbyup23:25
fungiits impact should be minimal23:25
clarkb(I also think this is a bug in their migration and I'm going to file a bug about it)23:26
clarkbbut later23:26
ianwyeah i think there's two things23:26
ianw1) it added NO_CHANGE to TRIVIAL_REBASE when it didn't need to23:27
ianw2) it added NO_CHANGE when there was no copyCondition23:27
clarkb++23:27
ianw2 might be on us, but 1 feels like a unnecessary addition23:27
clarkbI mean no copy condition means no copying23:27
fungitime check: we're almost 75% od the way through our window, so now is probably a good go/no-go point for the project renames23:27
clarkbthey shouldn't assume one out of thin air. I think both are a bug in gerrit23:27
ianwi think let's do the renames?23:28
clarkbyes I'm good to proceed23:28
fungii'm in favor, shouldn't require the full remaining 30 minutes23:28
fungijust wanted to be sure23:28
ianwi think we can merge the 879412 config update later and watch it apply23:28
clarkbianw: wfm23:29
clarkbianw: but we should maybe manually edit the manage-projects command in the interim23:29
clarkbbasically do the same 3.6 -> 3.7 replacement that you did in the docker compose file23:30
clarkbin /usr/local/bin/manage-projects23:30
fungithe "docker.io/opendevorg/gerrit:3.6 manage-projects $@" line needs s/6/7/23:32
ianwok, storyboard-dev push seemed to fail, but i don't think that's a concern23:32
clarkbianw: its not but it stopped things early23:32
fungiyeah, storyboard-dev can be commented out23:32
clarkband we need to comment out everything prior to storyboard-dev23:32
fungiwe don't need to worry about renaming things on it23:32
clarkbbecause this isn't idempotent23:32
ianwok, manage-projects update done23:33
fungiwant me to go ahead and make the /usr/local/bin/manage-projects edit?23:33
fungioh, you beat me to it23:33
clarkbkeep in mind we aren'tdone with the rename23:33
fungiright23:33
ianwyeah, so let's edit out storyboard-dev23:33
clarkbhttps://opendev.org/opendev/system-config/src/branch/master/playbooks/rename_repos.yaml#L1-L53 all of that needs to be edited out/commented out23:34
clarkbwe can't rerun the gitea stuff safely I don't think23:34
clarkbnor the gerrit moves23:34
ianwhuh, this has ignore errors23:35
clarkbit was a connection error23:35
clarkbwhich I think is a level before the task ignore errors23:36
fungiinteresting. i ssh'd into it just before the maintenance23:36
fungimaybe we have a stale host key from the server replacement23:36
ianwoh, or it's not in our now automated list for some reason :/23:36
fungii should have tried from bridge23:36
ianwsigh, anyway, i've removed everything before line 5423:37
ianwwe feel like that's ok to run?23:37
clarkbianw: yes I have looked at the file and it looks right ot me.23:37
fungiyep23:37
ianwok, attempt 2 looks good23:41
fungii concur23:41
clarkbopendev.org redirects work for me against whatever backend I balance to23:42
clarkbI have no reason to expect that other backends would be different given the status of the playbook running23:42
fungiyes, lgtm23:43
clarkbhttps://review.opendev.org/q/project:openstack/virtualpdu exists but has no changes yet (should show up via reindexing iirc)23:43
clarkbshow queuie shows the reindexing is happening23:43
fungiyep23:43
ianw++23:44
clarkbso I think we can take a quick breather and we shouldn't have any further gerrit downtime23:44
clarkbbut then followup with the project-config synchronization, emergenc file updates, and then sort out the copycondition stuff23:45
ianwi can merge the two project config changes, so we can watch those fail23:45
fungisounds good23:46
fungialso, for the sake of anyone following along from the peanut gallery, we don't expect more gerrit outages for the remainder of the window, but it's not a guarantee ;)23:46
opendevreviewMerged openstack/project-config master: Ironic program adopting virtualpdu  https://review.opendev.org/c/openstack/project-config/+/87623123:47
opendevreviewMerged openstack/project-config master: Rename x/xstatic-angular-fileupload->openstack/xstatic-angular-fileupload  https://review.opendev.org/c/openstack/project-config/+/87384323:47
ianw^ those two are in23:48
clarkbianw: might be good to start a single todo list somewhere too rather than have it in the doc all over? But I'll defer to you on that23:49
clarkbI detached from the screen but didn't `exit` not sure if I would've been the last one in it23:49
ianwyeah will work it into a single todo23:51
ianwso they didn't fire manage-projects jobs after the force merge -- zuul complained that the config was invalid23:52
clarkbsee I thought that may have been the case but I looekd at jobs and it seemed zuul runs after23:52
clarkbbut maybe this also saves us?23:52
clarkbfwiw the chagnes appear to be at https://opendev.org/openstack/project-config/commits/branch/master23:52
clarkbanother check for replication23:53
clarkbpretty sure the error would be due to the projects being invalid in gerrit now on the old side23:53
clarkband things should work when merging the third23:53
ianwi'm going to unemergency things how23:55
fungisounds good23:55
fungii assume how was now, and so a statement not a question ;)23:56
ianwi also put in https://review.opendev.org/c/opendev/system-config/+/879412 to start, that's the gerrit 3.7 config file update23:57
fungithanks23:57
ianwi'm going to merge the last change, lets see what happens23:57
fungithe others reported for their deploy fails, right?23:57
opendevreviewMerged openstack/project-config master: Rename x/ovn-bgp-agent to openstack/ovn-bgp-agent  https://review.opendev.org/c/openstack/project-config/+/87945623:58
fungior i guess they don't actually deploy because of the config errors?23:58
ianwfungi: yeah, they had config errors so didnt' run23:59
ianwhttps://zuul.opendev.org/t/openstack/stream/9891dc34ec3a4406b7eb34d39c7c91ff?logfile=console.log23:59
fungieven better23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!