#opendev-meeting log

19:01:18 <clarkb> #startmeeting infra
19:01:20 <openstack> Meeting started Tue Mar 30 19:01:18 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:23 <openstack> The meeting name has been set to 'infra'
19:01:27 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-March/000199.html Our Agenda
19:01:57 <diablo_rojo> o/
19:01:57 <clarkb> I wasn't around last week, but will do my best :) feel free to jump in help keep things going in the right direction
19:02:02 <ianw> o/
19:02:59 <clarkb> #topic Announcements
19:03:33 <clarkb> I didn't have any. Do others?
19:03:52 <fungi> i don't think so
19:03:59 <fungi> gitea was upgraded
19:04:07 <fungi> keep an eye out for oddities?
19:04:27 <clarkb> ++
19:04:27 <fungi> zuul was recently updated to move internal scheduler state into zookeeper
19:04:37 <fungi> keep an eye on that too
19:05:11 <clarkb> #topic Actions from last meeting
19:05:19 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-23-19.01.txt minutes from last meeting
19:05:32 <clarkb> ianw had an action to start asterisk retirement. I saw an email to service-discuss about it.
19:05:53 <ianw> no response on that, so i guess i'll propose the changes soon
19:06:06 <clarkb> ianw do you want to keep the action around until the changes are up and or landed? seems to be moving along at least
19:06:20 <ianw> sure, make sure i don't forget :)
19:06:36 <clarkb> #action ianw Propose changes for asterisk retirement
19:06:47 <clarkb> #topic Priority Efforts
19:06:54 <clarkb> #topic OpenDev
19:07:03 <clarkb> as mentioned we upgraded gitea from 1.13.1 to 1.13.6
19:07:31 <clarkb> keep an eye out for weirdness.
19:07:48 <clarkb> Do we also want to reenable project description updates and see if 1.13.6 handles that better? or maybe get the token usage change in first?
19:08:34 <ianw> tokens seems to maybe isolate us from any future hashing changes, but either way i think we can
19:09:04 <clarkb> ianw: maybe I should push up the description update change again and then compare dstat results with and without the token use.
19:09:20 <clarkb> that should give us a good indication for whether or not 1.13.6 has improved hashing enough or not?
19:09:25 <fungi> maybe
19:09:54 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/782887
19:09:54 <fungi> it was never completely smoking gun that project management changes triggered the cpu load
19:09:54 <ianw> for anyone reading without context :)
19:10:09 <fungi> they would sometimes overload *a* gitea backend and the rest would be perfectly happy
19:10:25 <clarkb> ya I suspect it has to do with background load as well
19:10:32 <fungi> so if we want to experiment in that direction, we'll need to leave it in that state for a while and it's not a surety
19:10:36 <clarkb> due to the way we load balance we don't necessary get a very balanced load
19:11:50 <clarkb> I also made some new progress on the gerrit account classification process before taking time off
19:12:10 <clarkb> if you can review groups in review:~clarkb/gerrit_user_cleanups/notes.20210315 and determine if they can be safely cleaned up like previous groups that would be great
19:12:23 <clarkb> I'll pick that up again as others have had a chance to cross check my work
19:12:29 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/780663 more user auditing improvements
19:12:41 <clarkb> that is a related scripting improvement. Looks like I have one +2 so I may just approve it today
19:13:01 <clarkb> essentially I had the scripts collect a bunch of data into yaml then I could run "queries" against it to see different angles
19:13:12 <clarkb> the different angesl are written down in the file above and can be corss checked
19:14:31 <clarkb> #topic Update Configuration Management
19:14:42 <clarkb> Any new config mgmt updates we should be aware of/review?
19:16:08 <fungi> i don't think so
19:16:19 <clarkb> #topic General Topics
19:16:30 <clarkb> #topic Server Upgrades
19:16:59 <clarkb> I did end up completing the upgrades for zuul executors and mergers and nodepool launchers
19:17:09 <clarkb> That leaves us with the zookeeper cluster and the scheduler itself
19:17:25 <clarkb> I have started looking at the zk upgrade and writing notes on an etherpad
19:17:26 <clarkb> #link https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021
19:18:02 <clarkb> that etherpad proposes two options we could take to do the upgrade. If ya'll can review it and make sure the plans are complete and/or express an opinion on which path you would like to take I can boot instances and keep pushing on that
19:20:02 <clarkb> #topic Deploy new refstack server
19:20:10 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/781593
19:20:26 <clarkb> this change merged yesterday. ianw  should I go ahead and remove this item from the meeting agenda?
19:20:51 <ianw> yep, deployment job ran so i'm not aware of anything else to do there
19:21:01 <clarkb> cool I'll get that cleaned up
19:22:15 <clarkb> #topic PTG Planning
19:22:31 <clarkb> I did submit a survey and put us on the schedule last week
19:22:52 <clarkb> the event runs April 19-23 and I selected Thursday April 22 1400-1600UTC and 2200-0000UTC for us
19:23:16 <clarkb> the first time should hopefully work for those in EU timezones and the second for those in asia/pacific/australia
19:23:45 <clarkb> my thought on that was we could do office hours and try to help some of our new project-config reviewers get up to speed or help other projects with infra related items
19:24:31 <clarkb> if the times just don't work or you think we need more or less let me know. I indicated we may need to rearrange scheduling when I filled out the survey
19:24:51 <clarkb> #topic docs-old volume cleanup
19:25:14 <clarkb> not sure if this is still current but it was on the agenda so here it is :)
19:25:52 <ianw> oh it was from when i was clearing out space the other day
19:26:05 <ianw> do we still need docs-old?
19:26:39 <fungi> we do not
19:26:47 <clarkb> is docs-old where we stashed the really old openstack documentation so that it could be found if people have really old installations but otherwise wouldn't show up in google results?
19:27:12 <fungi> that was kept around for people to manually copy things from if we failed to rebuild them during the transition to zuul v3
19:27:39 <fungi> i think anything we weren't actively building but was relevant was manually copied to the docs volume
19:27:47 <ianw> clarkb: yeah, it leaking into google via https://static.opendev.org/docs-old/ which i guess has nothing to stop that was a concern
19:28:08 <ianw> ok, well it sounds like i can remove it then
19:28:10 <fungi> we should probably robots.txt to exclude spiders from the whole static vhost
19:28:38 <clarkb> would it make sense to see if Ajaeger has an opinion?
19:28:46 <clarkb> since Ajaeger was pretty involved in that at the time iirc
19:29:44 <ianw> fungi: yeah, i can propose that.  everything visible there should have a "real" front-end i guess
19:31:20 <clarkb> I don't have enough of the historical context to make a decision. I'll defer to others, but suggest maybe double checking with ajaeger if we can
19:31:57 <ianw> ok, i can ask, don't want to bother him with too much old cruft these days :)
19:32:21 <clarkb> ya I don't think ajaeger needs to help with cleanup or backups or anything, just indicate if he thinks any of it is worth saving
19:32:51 <clarkb> #topic planet.openstack.org
19:33:05 <clarkb> Another one I don't have a ton of background on but I see a retire it option and I like the sound of that >_>
19:33:23 <clarkb> looks like the aggregator software is not being maintained anymore whih puts us in a weird spot doing server updates
19:33:26 <ianw> yeah, linux australia retired their planet which made me think of it
19:33:40 <fungi> i guess we should probably at least let the folks using it know somehow
19:33:45 <fungi> like make an announcement
19:33:58 <clarkb> ++ and probably send that one to openstack-discuss given the service utilization
19:34:00 <ianw> i did poke at aggregation software, i can't see any that look python3 and maintained
19:34:00 <fungi> i could get the foundation to include a link to the announcement in a newsletter
19:34:16 <clarkb> basically say the software is not maintained and we can't find alternaties. We will retire the service as a result.
19:34:23 <ianw> i thought we could replace it with a site on static that has an OPML of the existing blogs if we like
19:34:37 <ianw> these days, a RSS to twitter feed would probably be more relevant anyway
19:34:38 <fungi> or if the foundation sees benefit in it, they may have a different way they would want to do something similar anyway
19:34:55 <fungi> yeah
19:35:27 <fungi> microblogging sites have really become the modern blog aggregators anyway
19:35:57 <ianw> (i did actually look for an rss to twitter thing too, thinking that would be more relevant.  nothing immediately jumped out, a buch of SaaS type things)
19:36:05 <clarkb> ya twitter, hacker news, reddit etc seem to be the modern tools
19:36:24 <clarkb> and authors just send out links from their accounts on those platforms
19:36:26 <ianw> vale RSS, RIP with google reader
19:37:10 <ianw> maybe give me an action item to remember and i can send that mail and start the process
19:38:31 <clarkb> #action ianw Announce planet.o.o retirement
19:38:42 <ianw> i am old enough to remember when jdub wrote and released the original "planet" and we all though that was super cool and created a bunch of planets
19:39:02 <clarkb> #topic Tarballs ORD replication
19:39:26 <ianw> ok, last one, again from clearing out things earlier in the week
19:40:04 <ianw> of the things we might want to keep if a datacentre burns down, i think tarballs is pretty much the only one not replicated?
19:40:10 <ianw> #link https://etherpad.opendev.org/p/gjzssFmxw48Nn3_SBVo6
19:40:13 <ianw> that's the list
19:41:09 <ianw> docs is already replicated
19:41:15 <clarkb> ++ I think the biggest consideration has been that the vos release to a remote site of large sets of data isnt' quick
19:41:23 <clarkb> I think tarballs is not as large as our mirrors but bigger than docs?
19:41:33 <clarkb> I also suspect that we can set it up and see how bad it is and go from there?
19:41:37 <fungi> yeah, in that ballpark
19:41:51 <fungi> also the churn is not bad as it's mostly append-only
19:42:05 <fungi> or at least that's the impression i have
19:42:17 <fungi> i guess we'll find out if that's really true
19:42:36 <ianw> yeah, i don't think it's day-to-day operation; just recovery situations
19:42:39 <ianw> which happen more than you'd hope
19:43:05 <ianw> but still, i'd hate to feel silly if something happened and we just didn't have a copy of it
19:44:07 <clarkb> ya I think this is the sort of thing where we can make the change, monitor it to see if it is unhappy and go from there
19:44:12 <ianw> ORD has plenty of space.  we can always drop the RO there in a recovery situation i guess too, if we need
19:44:27 <ianw> alright, i'll set that up.  lmn if you think anything else in that list is similar
19:44:29 <clarkb> I want to say the newer openafs version we upgraded to is better about higher latency links?
19:44:59 <ianw> apparently, but still there's only so fast data gets between the two when it's a full replication scenario
19:45:40 <clarkb> ianw: maybe do all the project.* volumes?
19:46:08 <clarkb> I think those host docs for various things like zuul and starlingx
19:46:25 <clarkb> mirror.* shouldn't matter and is likely to be the most impacted by latency
19:46:46 <ianw> yeah, probably a good idea.  i can update the docs for volume creation because we've sometimes done it and sometimes not it seems
19:46:55 <clarkb> ++
19:47:24 <fungi> sure, small volumes are probably good to mirror more widely if for no other reason than we can, and they're one less thing we might lose in a disaster
19:48:12 <ianw> yeah, it all seems theoretical, but then ... fires do happen! :)
19:49:29 <clarkb> indeed
19:49:37 <clarkb> #topic Open Discussion
19:49:47 <clarkb> That was all on the published agenda
19:49:59 <ianw> i have a couple of easy ones from things that popped up
19:50:02 <clarkb> worth noting we think we have identified a zuul memory leak which is causing zk disconnects
19:50:11 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/782868
19:50:15 <ianw> stops dstat output to syslog
19:50:24 <clarkb> fungi was going to restart the scheduler to reset the leak and keep us limping along. corvus mentioned being able to actually debug tomorrow
19:50:31 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/783120
19:50:40 <ianw> puts haproxy logs into our standard container locations
19:50:59 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/782898
19:51:06 <clarkb> ianw: the dstat thing is unexpected but change lgtm
19:51:09 <ianw> allows us to boot very large servers when they are donated to us :)
19:51:28 <clarkb> ha on that last one
19:52:10 <fungi> yeah, we're a few minutes out from being able to restart the scheduler without worrying about openstack release impact
19:52:26 <fungi> i'm just waiting for one build to finish updating the releases site
19:52:34 <ianw> is it helpful to restart with a debugger or anything for the leak?
19:53:10 <fungi> oh, clarkb, that oddity we were looking at with stale gerritlib used in a jeepyb job? it happened again when i rechecked
19:53:17 <ianw> clarkb: yeah, i was like "i'm sure i provided a reasonable size for boot from volume ... is growroot failing, etc. etc." :)
19:53:33 <clarkb> ianw: I want to say we already have a hook to run profiling on object counts
19:53:42 <clarkb> ianw: but that is agood question and we should confirm with corvus before we restart
19:53:49 <corvus> i have not previously used a debugger when debugging a zuul memory leak; only the repl and siguser
19:54:07 <corvus> i'm always open to new suggestions on debugging memleaks though :)
19:54:20 <clarkb> seems like the repl stuff and getting object counts has been really helpful in the past at least
19:56:38 <clarkb> corvus: when I've tried in the past its been "fun" to figure out adding debugging symbols and all that. I suspect that since we use a compiled python via docker that this may be even more fun?
19:56:49 <clarkb> we can't just install the debugger symbols package from debian
19:57:05 <clarkb> (sorting that out may be a fun exercise for someone with free time though as it may be useful generally)
19:57:25 <clarkb> sounds like this may be about it. I can end here and we can go have breakfast/lunch/dinner :)
19:57:29 <clarkb> thank you everyone!
19:57:31 <clarkb> #endmeeting