Monday, 2022-04-25

*** rlandy is now known as rlandy|PTO00:12
*** ysandeep|out is now known as ysandeep03:51
fricklerianw: can you please revisit https://review.opendev.org/c/zuul/nodepool/+/834152 when you have time? we should find a solution before gtema pushes the button04:49
fricklerI also added the jammy related patches to the meeting agenda, but wouldn't mind getting reviews earlier ;)04:52
*** bhagyashris is now known as bhagyashris|ruck05:46
*** ysandeep is now known as ysandeep|afk06:01
*** ysandeep|afk is now known as ysandeep06:48
*** pojadhav is now known as pojadhav|afk07:31
*** jpena|off is now known as jpena07:35
*** pojadhav|afk is now known as pojadhav\08:26
*** pojadhav\ is now known as pojadhav08:26
*** tkajinam is now known as tkajinam|away08:33
*** rlandy|PTO is now known as rlandy10:33
*** dviroel_ is now known as dviroel11:07
*** dviroel is now known as dviroel|rover11:07
*** ysandeep is now known as ysandeep|afk12:25
*** artom__ is now known as artom13:17
opendevreviewCedric Jeanneret proposed opendev/system-config master: Use goto, chain policy and drop REJECT  https://review.opendev.org/c/opendev/system-config/+/83921013:22
opendevreviewCedric Jeanneret proposed opendev/system-config master: Use goto, chain policy and drop REJECT  https://review.opendev.org/c/opendev/system-config/+/83921013:25
opendevreviewCedric Jeanneret proposed openstack/project-config master: Use goto, chain policy and drop REJECT  https://review.opendev.org/c/openstack/project-config/+/83921213:29
*** ysandeep|afk is now known as ysandeep13:41
*** pojadhav- is now known as pojadhav14:06
opendevreviewAlbin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir  https://review.opendev.org/c/zuul/zuul-jobs/+/83922514:06
opendevreviewAlbin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir  https://review.opendev.org/c/zuul/zuul-jobs/+/83922514:09
opendevreviewAlbin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir  https://review.opendev.org/c/zuul/zuul-jobs/+/83922514:11
*** pojadhav- is now known as pojadhav14:22
corvusi'm going to begin a zuul rolling restart now14:27
*** tkajinam|away is now known as tkajinam14:42
clarkbonce I've caught up on email and system updates I'm going to look at shutting down the ELK servers. Then I'll snapshot subunit-worker01, health.o.o, logstash-worker01, logstash01, elasticsearch02 and then delete them all?14:49
clarkbfungi: ^ when you do that do you shutdown within the instance and then snapshot using osc or do you have to snapshot via the web ui?14:49
clarkbalso if anyone has reason to not do these shutdowns and deletions just yet please let me know14:51
clarkbbut I haven't seen anything that would prevent it at this point14:51
*** hrww is now known as hrw15:10
fungiclarkb: within the instance i `systemctl poweroff` and then the instance shows down in the nova api once that completes15:19
clarkbah cool. I've just gone and made a bunch of notes about servers and ip addres and uuids. Proceeding to shutdown instances. Then will sort out snapshots after15:22
fungifor xenial, systemctl may not be functional (i can't remember) but just `sudo poweroff` should also do the trick15:25
clarkbits systemd so seems to haev worked. Openstack hasn't caught up that they are shutdown yet. But thats ok as I need to snapshot the old health and subunit worker servers first and they are long caught up15:28
fungiyeah, i usually give it a few minutes15:28
fungibut this way, if you ever have to boot a snapshot of the system it thinks it's just coming back up from a clean reboot of the original15:29
clarkbthese are the servers I plan to snapshot: health01.openstack.org, subunit-worker01.openstack.org, logstash01.openstack.org logstash-worker01.openstack.org, and elasticsearch02.openstack.org15:29
clarkbthat last server (es02) has a data volume attached to it which the snapshot should ignore which is what we want15:29
fungisounds good15:29
clarkbI don't intend on snapshotting all of the cluster members, are we ok with that?15:29
clarkbseems a bit overkill15:30
fungiyeah, i don't see any reason to so more than one from each cluster15:30
*** dviroel|rover is now known as dviroel|rover|lunch15:30
clarkb`osc server image create` seems to be the command to snapshot?15:31
clarkbI've got that running for health01 and subunit-worker01 now15:34
*** ysandeep is now known as ysandeep|out15:38
fungiyeah, that works, or the webui15:39
fungispeaking of volumes, we got a notification from rackspace that there's going to be a cinder maintenance in ord next week impacting a volume for the old bup backup server. i don't think we need to take any action15:40
clarkbapi reflects shutdown status for all the servers now. I'll proceed to snapshot the other 3 servers I mentioned now.15:40
clarkbnow I guess I need to wait a bit for the snapshots15:43
clarkbtrying to do a volume list I'm reminded that we need a hacked up clouds.yaml to do volume listings?15:47
clarkbaha I can override on the ocmmand line15:48
fungiclarkb: i've been unable to work out how to do it with current osc, so i just use ~fungi/launch-env/bin/cinder --os-volume-api-version=1 list (with the old-school envvars exported in the environment)15:49
clarkbya --os-volume-api 1 workjed for me15:50
clarkbI don't expect the es volumes to go away automatically but wanted to be sure if they did that I had a record of them ebfore they do go away15:51
clarkbI've got that now and can manually delete them if necessary15:51
clarkbwhere are at in the process is waiting for snapshots to complete. Then I might have fungi or whoever else is interested do a quick look and make sure I haven't forgotten anything then I'll proceed to instance deletions15:53
fungitrying to do `openstack --os-volume-api-version 1 volume list` i get "Version 1 is not supported, use supported version 3 instead. Invalid client version '1.0'. Major part should be '3'"15:53
clarkbfungi: oh ya I use an older install in my homedir too15:53
clarkbbecause osc removed the old api support15:53
fungiright, this is with 5.6.015:53
fungiokay15:53
clarkbthe first two snapshots (health01 and subunit-worker01) are done. I'm going to find breakfast while I wait on the other 315:56
*** dviroel|rover|lunch is now known as dviroel|rover16:17
clarkbfungi: ok health01, subunit-worker01/02, logstash01, logstash-worker01-20, elasticsearch02-07 are all shutdown. health01, subunit-worker01, logstash01, logstash-worker01, and elasticsearch02 appear to all have snapshot images now. Any chance you have time to double check me on that and give an all clear to begin the actual deletions?16:18
opendevreviewClark Boylan proposed openstack/project-config master: Set noop jobs on ELK puppetry to prep for retirement  https://review.opendev.org/c/openstack/project-config/+/83923516:40
opendevreviewClark Boylan proposed opendev/puppet-kibana master: Retire this project  https://review.opendev.org/c/opendev/puppet-kibana/+/83923716:43
opendevreviewClark Boylan proposed opendev/puppet-elasticsearch master: Retire this project  https://review.opendev.org/c/opendev/puppet-elasticsearch/+/83923816:46
*** jpena is now known as jpena|off16:46
opendevreviewClark Boylan proposed opendev/puppet-log_processor master: Retire this project  https://review.opendev.org/c/opendev/puppet-log_processor/+/83923916:48
opendevreviewClark Boylan proposed opendev/puppet-logstash master: Retire this project  https://review.opendev.org/c/opendev/puppet-logstash/+/83924016:50
opendevreviewClark Boylan proposed opendev/puppet-subunit2sql master: Retire this project  https://review.opendev.org/c/opendev/puppet-subunit2sql/+/83924216:52
opendevreviewClark Boylan proposed openstack/project-config master: Finalize ELK puppetry retirement  https://review.opendev.org/c/openstack/project-config/+/83924316:57
clarkbonce the servers are gone I think we're good to land ^16:58
fungiclarkb: sorry, was stuffing my face. i'll take a look now17:11
clarkbthanks. Not in a huge rush so no biggie. I found time to do other things like construct that stack of changes17:12
fungiclarkb: i see all 5 images saved, lgtm17:14
clarkbgreat, I'll proceed with deleting instances now.17:14
clarkbno objection to that right?17:15
funginone from me, thanks!17:17
clarkbthe subunit workers and ehalth have been deleted. Now to work on the ELK servers17:19
clarkbalright all the ELK related servers are gone now. It looks like the volumes did not get auto deleted. I'll proceed to delete those manually17:25
clarkbvolume elasticsearch07.opendev.org/main01 entered error deleting. I'll double check that none of the servers did that now17:28
clarkbnone of the servers entered an error state. They are gone17:29
clarkbNext on my list is dns record cleanup. Then after that the last thing I've got is what to do with the subunit2sql trove instance?17:29
clarkbfungi: ^ you may have ideas17:30
fungiwe can also snapshot that if you think the data is likely to be relevant17:30
clarkbI think the issue there is its huge iirc17:31
clarkband ya i'm not sure how relevant it is considering no one noticed the service stopped running for quite a while17:32
fungihuge for a trove instance17:32
fungi500gb maybe?17:32
fungii personally don't think there's any point in keeping the data17:32
clarkbya I think I'm with you on that.17:32
clarkband we can just delete the trove instance17:33
fungigood and bad news on pep 686: the sc has agreed to making utf-8 mode the default, but has scheduled it to not happen until 3.517:34
fungi3.1517:34
clarkbwow thats a ways out17:40
clarkbfungi: ok all dns records (including the health.o.o and logsatsh.o.o CNAMEs) have been removed. Except for A records for subunit-worker01 and subunit-worker02. They just don't show up in the web ui so not sure what is going on there17:41
clarkbotherwise all the A and AAAA and CNAME records for those servers have been removed17:41
clarkbweird I exited the list view and opened the zone again and now I see those records. I'll delete them before they disappear again17:42
fungiclarkb: thanks, looks like they're no longer resolving17:43
fungiwere you scrolling through the entries or just trying to search in the browser?17:44
clarkbI was scrolling. It seemed like none of the records starting with s loaded though17:44
fungibizare17:44
clarkbonce I reloaded the list and scrolled through they showed up17:44
clarkbI'm going to go aheada nd status log here, but then can ask about the subunit2sql db in the team meeting tomorrow17:45
clarkblooks like it is using about 286GB out of 500GB max17:45
clarkb#status log The retired ELK, subunit2sql, and health api services have now been deleted.17:45
opendevstatusclarkb: finished logging17:46
clarkbalright meeting agenda is getting udpated before being sent later today. Please add your content if you ahve any.17:54
clarkbfungi: https://review.opendev.org/q/topic:retire-elk the oldest change there, 839235, is straightforward flip CI to noop jobs for these repos that we'll retire change if you have time for that17:56
clarkbonce that lands I can recheck all the changes to retire content in the repos17:57
*** rlandy is now known as rlandy|mtg18:00
fungisounds great, thanks!18:00
fricklerclarkb: fungi: how about I merge https://review.opendev.org/c/opendev/system-config/+/838923 (jammy mirroring) tomorrow my morning and watch how it goes? seems pretty low risk except possibly filling up its quota18:06
clarkbfrickler: that wfm. I think the cleanups I did should give it plenty of room18:07
fungii'm also happy to monitor it today if you'd rather have a head start on things18:08
opendevreviewMerged openstack/project-config master: Set noop jobs on ELK puppetry to prep for retirement  https://review.opendev.org/c/openstack/project-config/+/83923518:08
fricklerfungi: if you have time for that, I won't object, then I could possibly watch an image build instead ;)18:09
fungiapproved! once it deploys i'll take the lock and run reprepro without the timeout to make sure it completes18:12
fricklercool, thx18:13
fungithere's currently a reprepro run for ubuntu in progress, but i have a root screen session going on mirror-update.o.o and will grb the lock once it's released18:14
opendevreviewClark Boylan proposed opendev/system-config master: Update Gerrit build checkouts  https://review.opendev.org/c/opendev/system-config/+/83925018:14
opendevreviewClark Boylan proposed opendev/system-config master: Explicitly disable Gerrit tracing.performanceLogging  https://review.opendev.org/c/opendev/system-config/+/83925118:14
clarkbmore gerrit 4.5 prep ^18:14
clarkber3.518:14
opendevreviewMerged opendev/system-config master: Start mirroring jammy  https://review.opendev.org/c/opendev/system-config/+/83892318:38
*** rcastillo_ is now known as rcastillo19:02
*** rlandy|mtg is now known as rlandy19:03
clarkbfungi: what do you think about landing those project retirement changes now that the noop change is in place?19:34
fungisounds good to me19:37
fungii can review after i finish making dinner19:37
clarkbgreat. I'll probably pop out for a bike ride in an hour or too as well but the impact for those changes should be nil now that the servers are gone19:38
clarkbmostly trying to clean everything up so that we don't leave anything behind to confuse us later :)19:38
clarkbI've not spent nearly enough time on the bike this year. Trying to correct that.19:39
clarkbhttps://review.opendev.org/c/opendev/system-config/+/839251/1 is interesting because I amanged to catch that reading the gerrit mailing list. Basically gerrit 3.5 uses more memory by default because by default it collects tracing info20:09
clarkbsince we aren't hooked up to a tracing system we can disable it and save some memory hopefully20:10
clarkbfungi: looks like you might have the ubuntu mirror updatel ock now?20:18
fungiyes20:18
clarkbat least I see a flock for it but no other processes20:18
clarkbcool20:19
fungii have the reprepro script readied in a root screen session20:19
fungiwas just waiting for the deploy results to report20:19
fungiwhich looks like it did at 19:01:0720:23
fungistarting it now20:23
fungioutput is tee'd to the usual log so it can also be seen in the screen buffer20:23
clarkbthanks20:26
fungiERROR: Condition '437D05B5|C0B21F32' not fulfilled for '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release.gpg'.20:26
fungiSignatures in '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release.gpg':20:26
fungi'871920D1991BC93C' (signed 2022-04-25): missing pubkey20:26
fungiError: Not enough signatures found for remote repository ubuntu-security (http://security.ubuntu.com/ubuntu jammy-security)!20:27
clarkbhrm I thought ubuntu used the same key over and over? Maybe not for security?20:27
fungiguess we need to add a key20:27
fungigpg --verify /afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release{.gpg,}20:29
fungigpg: Signature made 2022-04-25T18:35:45 UTC using RSA key 0x871920D1991BC93C20:29
fungii can't seem to gpg --recv-keys 0x871920D1991BC93C20:30
fungigpg: key 0x871920D1991BC93C: new key but contains no user ID - skipped20:30
fungithat key was created in 201820:31
clarkbthey distribute the keys with apt. Is possible they just never put it on the key servers?20:31
clarkbhttps://bugs.launchpad.net/ubuntu/+source/reprepro/+bug/1968198 doesn't seem related but might end up affecting us too20:32
fungigpg --keyserver keyserver.ubuntu.com --receive-keys 0x871920D1991BC93C20:34
fungithat worked20:34
fungigpg: key 0x871920D1991BC93C: public key "Ubuntu Archive Automatic Signing Key (2018) <ftpmaster@ubuntu.com>" imported20:34
clarkbI wonder if that just means we haven't been updating our keys like we did with debian in the repreoro config management. That seems possible20:36
fungiyeah, i'm experimenting20:37
fungilooks like we use playbooks/roles/import-gpg-key/tasks/main.yaml to import each of the archive keys into the root gnupg keyring?20:42
clarkbyes, we keep an ascii armored version of the pubkey in the role and those tasks iterate over them and intsall them20:43
fungiyep, just making sure. so if i really want to test this, i'll end up bypassing that role20:43
fungii'll just propose the change i think it needs20:44
clarkbI think you also need to list the key fingerprint in the reprepro configs20:44
clarkbI'm going to work on getting out for that exercise now. Will check on that when I get back. I don't think its a big deal if you manually toggle it and also push a change we land next20:46
opendevreviewJeremy Stanley proposed opendev/system-config master: Add Ubuntu's 2018 Archive Signing Key to reprepro  https://review.opendev.org/c/opendev/system-config/+/83926120:51
fungiyeah, i think i got everything in ^20:52
fungii'm around all night, so happy to just wait for that to land and deploy and then try again20:53
ianwfrickler: thanks for working on jammy things.  in answer to your prior question on why we use ntpdate/chrony/systemd-timesync/* the answer is pretty much I don't know and will have to context switch it back in :)21:21
ianwi think we kind of make decisions that seem right at the time, but it's always worth revisiting as the world turns21:22
fungiin the past we've oscillated between taking whatever the platform provides by default vs overriding platform defaults in order to drive consistency across different node labels21:28
fungiand this is yet another of those situations21:28
opendevreviewMerged opendev/system-config master: Add Ubuntu's 2018 Archive Signing Key to reprepro  https://review.opendev.org/c/opendev/system-config/+/83926121:58
*** rlandy is now known as rlandy|bbl22:16
fungiand it's deployed, so trying again22:20
fungiseems to have gotten past the prior error22:22
*** dviroel|rover is now known as dviroel|rover|afk22:36
clarkbfungi: have you run into the error in the lp bugI linked?22:43
funginope22:43
clarkbcool hopefully that got fixed one way or another22:43
fungiwith 839261 deployed it's well into pulling down packages now22:43
clarkbianw: did you see frickelr was aksing if you could followup to https://review.opendev.org/c/zuul/nodepool/+/834152 ? I think his suggesting is that you push a new patchset to make the change you are asking for to avoid any confusion22:43
fungiprobably will still be going for hours but i'll try to keep an eye on it over the course of my evening22:44
clarkbthanks. My bike ride was fun. I went out and it was decent weather. Its sunny now. But for about 45 minutes of my bike ride the skies decided torrential downpour and hail would be appropriate22:47
clarkbI'll send out the meeting agenda in a few minutes if there is anything else to add let me know22:52
clarkbI guess our afs graphs track the RO volumes and not RW so we won't see progress via disk usage23:05
fungiyeah, not until it finishes23:06
ianwit might show the rw volume, i don't think it explicitly doesn't at least ...23:33
clarkbits not a big deal I was just hoping to see a slowly increasing disk utlization grpah to estimate progress23:34
ianwi think it might pull the r/o into a different stat https://opendev.org/opendev/afsmon/src/branch/master/afsmon/__init__.py#L8223:34
ianwyeah the readonly ones are like mirror_fedora_readonly23:37
ianwand the stats page shows the r/w volumes.  it's interesting because i'm not sure if that naming is a feature or a bug23:39
clarkbhuh it does show a small bump now to 692GB23:40
clarkbalso I've realized that ubuntu ports for arm64 is a seprate volume so we may not have room to do those just yet. That said I just cleared out 6TB of elasticsearch volumes. Maybe we should allocate 2TB back to AFS23:40
ianwit was only individual volumes exceeding 2tb that was the issue, wasn't it?  when we had our on-disk pypi mirror23:42
clarkbyes I beliee so23:43
clarkbpretty sure we can go to 3TB total then keep individual volumes under 2TB23:43
clarkbin this case I say 2TB to afs because 1TB to each dfw server23:43
ianwit would complicate things to have a vicepb i guess, but speaking from experience if this volume needs to fsck it's nerve wracking23:46
fungioh, right, because it's a virtual fs built on files on another fs23:47
fungii always forget it's not just backed by a raw block device23:47
corvusas a point of interest -- there is once again a single job running for an excessive amount of time that's holding up the zuul rolling restart.  this time it's the nova-live-migration job and it's stuck running the opendev.org/opendev/base-jobs/playbooks/base/cleanup.yaml playbook23:53
corvusi think that playbook may not deal with systemic node connection problems well :/23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!