19:01:29 <clarkb> #startmeeting infra
19:01:30 <openstack> Meeting started Tue Oct  6 19:01:29 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:33 <openstack> The meeting name has been set to 'infra'
19:01:42 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-October/000102.html Our Agenda
19:01:49 <clarkb> #topic Announcements
19:02:10 <clarkb> PTG and Summit happen this month. Register now if you haven't already and plan to attend (it is free to register)
19:02:29 <clarkb> The OpenStack release happens next week which we should keep in mind for service changes
19:02:47 <clarkb> and finally Rax hosted db outages around 03:00-05:00 UTC Friday including those for review and grafana
19:03:52 <clarkb> Thats thursday evening around here and Friday morning for ianw I think. I'll try to be around so that ianw isn't the only one at a keyboard if gerrit or grafana get sad
19:04:59 <ianw> heh, yeah that's a good time for .au for things to go wrong :)
19:05:17 <clarkb> #topic Actions from last meeting
19:05:24 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-29-19.01.txt minutes from last meeting
19:05:28 <clarkb> We recorded no actions
19:05:33 <clarkb> #topic Priority Efforts
19:05:41 <clarkb> #topic Update Config Management
19:05:50 <clarkb> Any configuration management updates to call out?
19:06:36 <fungi> graphite got swapped out and the old server deleted thanks to ianw's tireless effort
19:07:01 <ianw> well i haven't deleted the old server yet, but it has been a week so i'll probably do it soon (on my todo list :)
19:07:25 <fungi> oh, right, deleted it from the inventory
19:08:02 <ianw> hrm, it looks like that got a -2 for a job that passed in check but failed gate ... https://review.opendev.org/#/c/755150/  ... will look into that
19:10:05 <clarkb> #topic OpenDev
19:10:52 <clarkb> Lets start with gitea. We discovered that setting descriptions errors in a very small portion of requests. I've made the description update non fatal in our repo management if it fails, we'll get to it on another pass (it is fatal if project creation breaks)
19:11:11 <clarkb> I also updated the gitea config to log tracebacks on those errors so that if it happens again we can hopefully debug it further than it broke
19:11:26 <corvus> so it intermittently fails?
19:11:33 <clarkb> I have confirmed the new config setting is functional and we get tracebacks when starting gitea and the mysql server isn't listening on tcp yet
19:11:45 <clarkb> corvus: yes the error message we get indicates the cache returns an unexpected nil value
19:11:50 <corvus> like, it's not that some project has a pile of poo emoji in the description and that one always fails?
19:11:51 <corvus> k
19:11:54 <clarkb> correct
19:12:26 * ianw has a new idea for the dib project description :)
19:12:34 <fungi> though i can think of a few of my projects which probably deserve a pile of poo as a description
19:12:43 <fungi> ianw beat me to it
19:14:15 <clarkb> On the gerrit upgrade side of things fungi and I have been doing pair programming ops style and got review-test into shape mimicing a gerrit 2.13 as of october 1. We then upgraded it to 2.16 pre notedb migration using the process described at https://etherpad.opendev.org/p/gerrit-2.16-upgrade
19:14:28 <clarkb> That server is now up and running and you can test it at https://review-test.opendev.org
19:14:49 * diablo_rojo sneaks in late
19:15:04 <fungi> we also timed the relevant steps so we can gauge how much downtime we'll likely incur for them
19:15:07 <clarkb> The next thing I'm working on is the notedb migration and 3.x upgrades. I've found that we need 3.1 and 3.2 images so am working on that next. While doing that I expect we'll leave 2.16 up like that so people can interact with it
19:15:38 <clarkb> the reason for the notedb migration and 3.x stuff happening next is we want data to determine if we should do a 2.13 -> 3.x upgrade or just 2.13 -> 2.16 then later 2.16 -> 3.x
19:15:59 <clarkb> As expected things like our hacky js CI results table do not work in either the old ui or the polygerrit ui on gerrit 2.16
19:16:05 <fungi> also if folks have any observations/concerns about the 2.16 deployment there, please add them to the notes section at the bottom of the etherpad
19:16:28 <clarkb> more suprisingly the zuul comment commentlink config does not work under polygerrit and fomr what I've been able to debug cannot be made to work without changing how zuul comments
19:16:44 <corvus> huh :(
19:16:57 <clarkb> the issue being that gerrit finds the url that zuul posts and treats that separately to other strings which you can regex match on
19:17:03 <fungi> either things we'll need to address before the production upgrade, or things we might want to look into fixing after upgrading
19:17:10 <clarkb> https://gerrit.googlesource.com/gerrit/+/refs/heads/stable-2.16/polygerrit-ui/app/elements/shared/gr-linked-text/link-text-parser.js#268 is the code that does that in gerrit
19:17:30 <clarkb> we could fix that by having zuul not leave comments with urls then do comment link rewrites to urls ourselves
19:17:37 <fungi> yeah, basically no commentlink matching if the pattern covers an existing url in the content
19:17:42 <clarkb> I also checked that code through 3.2 and I believe its just as broken there as 2.16
19:18:08 <clarkb> in positive news the upgrade itself went pretty smoothly
19:18:10 <corvus> well, zuul needs to leave usable comments without commentlinks
19:18:17 <corvus> so i think they have to have urls
19:18:22 <clarkb> corvus: ya and it does, so I'm ok with just ignoring that
19:18:33 <clarkb> they aren't as pretty but they are usable imo
19:18:34 <fungi> i also don't think the commentlinks really buy us a lot for the zuul comments anyway
19:18:55 <clarkb> another neat thing is gerrit annotates comments that leave -1's and -2's so the zuul comments jump out in the comment list
19:19:02 <corvus> so the issue is that we can't modify the link to replace the anchor text with the job name?
19:19:03 <clarkb> making it easy to find them and figure out what is broken
19:19:17 <clarkb> corvus: we can't have a regex that matches a url
19:19:34 <clarkb> corvus: and the end result is what you describe
19:19:54 <corvus> well, i'm unclear about whether the problem is that we can't match a url or we can't modify the anchor text
19:20:01 <clarkb> I can match either side of the url but not the url itself
19:20:24 <corvus> like, if the comment were "jobname {url}" would we be able to match the interior of the {} and change the anchor text of the resulting url?
19:20:34 <fungi> and it's matches on the raw comment string minus the url, not the baked comment html
19:21:16 <clarkb> corvus: thats a good qusetion, we can test that by leaving some comments manually and trying to match them with commentlinks
19:21:46 <clarkb> corvus: I expect that if we manually leave such a comment and it doesn't get rendered to an href then we could use commentlinks for that but if the inner bit of {} gets rendered to href we can't
19:21:53 <fungi> we may however wind up with an href object string inside an escaped version of our attempt at adding an href object, but worth a try
19:22:27 <corvus> i think it would be in scope to update zuul's comment syntax to make it parseable with commentlinks in newer gerrit, i think the only restriction is we need to make it sensible without commentlinks
19:22:36 <fungi> &lt;a href="<a href="https:...
19:23:02 <fungi> is kinda what i'm expecting that to result in
19:23:23 <corvus> i can try to help with this
19:23:35 <clarkb> corvus: thanks, feel free to edit the config and reload on review-test
19:23:41 <corvus> k
19:23:45 <clarkb> I think you can do that without restarting gerrit on 2.16 but I haven't confirmed that yet
19:23:59 <corvus> fancy
19:24:10 <fungi> but also restarting the container shouldn't be much of a concern
19:24:22 <clarkb> corvus: and note polygerrit commentlinks are parsed on the raw string not the rendered html
19:24:32 <clarkb> that was the first thing I had to sort out
19:24:58 <clarkb> on the image building side of things I'm running into issues with jgit being a submodule in gerrit now. But will have a patchset after the meeting to see if I've figured that out
19:25:20 <corvus> i thought we used zuul required-projects to set that up?
19:25:43 <clarkb> corvus: we do except for like ~3 repos because they don't have branches/tags/refs to checkout and they just use submodule pointing at specific refs
19:25:54 <clarkb> corvus: jgit was only converted to a submodule in 3.1 which is why its different
19:26:11 <clarkb> but also they don't use refs that we can easily checkout and just have it on specific commits :/
19:26:28 <clarkb> looks like mordred stubbed support for this out and I just need to tie it into the new jobs I'm creating
19:27:02 <corvus> might be worth a look at the zuul jobs in upstream gerrit too, maybe we did something there we can backport
19:28:25 <clarkb> As mentioned earlier the upgrade process went really smoothly and would probably only take a day to get to 2.16 pre notedb
19:28:35 <clarkb> considering all the moving parts thats pretty good
19:29:06 <fungi> aggregate time to do the gc passes and reindexing is only a few hours
19:29:18 <fungi> which is way better than i anticipated
19:29:45 <clarkb> this is why I'm now looking at 3.x as it isn't entirely far fetched to think we could make that leap in one go
19:29:46 <fungi> and the db migrations
19:29:53 <corvus> clarkb: do we have this locally?  https://gerrit.googlesource.com/zuul/jobs/+/refs/heads/master/roles/prepare-gerrit-repos/tasks/repo.yaml#53
19:30:22 <clarkb> corvus: ya mordreds stubbed out thing is similar
19:30:33 <clarkb> corvus: except it hardcodes the things to submodule init on
19:30:48 <corvus> k; we might be able to just move to the upstream one?
19:31:30 <clarkb> ya that may be a good refactor
19:32:24 <clarkb> the last thing I wanted to note here is that I think we should be careful about trying to fix everything pre upgrade otherwise we may never upgrade :) if we find issues and identify which ones are important to fix that would be helpful so that we can prioritize them as much as possible
19:32:54 <clarkb> its storyboard for example hasn't been updated in a long long time
19:33:15 <corvus> ++ i bet we can live with weird commentlinks :)
19:34:13 <clarkb> but ya please do give that server a good test. I've dumped the db and snapshotted the cinder volume on 2.13 and 2.16 already so we can roll back pretty easily
19:34:26 <clarkb> Any thing else to bring up on the topic of opendev before we move on?
19:34:42 <fungi> its storyboard is a good example of something we could probably just replace with a zuul job too
19:34:48 <clarkb> fungi: ++
19:34:50 <fungi> er, its-storyboard
19:35:28 <fungi> also i don't think we've tested whether the hook scripts will continue to work
19:35:38 <corvus> clarkb: maybe send out an email to service-discuss to ask folks to poke at review-dev?
19:35:45 <fungi> (for things like launchpad bug updates and welcome new contributor message)
19:35:47 <clarkb> corvus: oh ya I can do that too
19:35:50 <ianw> just a big thanks to clarkb and fungi for working on it!!!!
19:36:17 <corvus> ianw: ++  thanks clarkb and fungi and also luca :)
19:36:27 <clarkb> and mordred for laying the groundwork
19:36:36 <fungi> yes, luca's guidance has been invaluable
19:36:58 <fungi> also if you say mordred three times he might appear
19:37:02 * mordred hands out candy and goats to everyone in celebration
19:37:16 <fungi> oh, two times! ;)
19:37:44 <mordred> fungi: that was actually three ;)
19:37:44 <clarkb> #topic General topics
19:37:56 <corvus> mordred: getting ready for halloween nola style? :)
19:38:01 <clarkb> #topic PTG Planning
19:38:02 <fungi> mordred: yeah, you're right (four now!)
19:38:17 <clarkb> corvus: for some reason I just assume they had out cocktails isntead of candy
19:38:25 <mordred> corvus: does buying pig feet count?
19:38:48 <clarkb> #link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 PTG Planning and details for OpenDev here
19:38:49 <corvus> mordred: pretty sure that's just 'tuesday' not halloween
19:39:20 <fungi> in my old neighborhood the russian family across the street from us set up a table and handed out vodka shots to the parents dragging their costumed toddlers around. a public service
19:39:21 <clarkb> I've scribbled notes on that etherpad. As mentioned before please indicate if you'd like to be a part of specific discussions and we'll do our best to accomodate with timezones
19:39:41 <corvus> those looks like pretty good times
19:39:44 <clarkb> fungi: and that was in nola?
19:39:54 <fungi> nope, raleigh
19:40:21 <clarkb> oh also fungi and ttx and diablo_rojo have indicated they have a million ptg things and if you are in that boat too and have conflicts wieh opendev let us know
19:40:33 <clarkb> I think we can do minor tweaks to the schedule to accomodate
19:40:44 <fungi> i expect to just have to float in and out of sessions. where conflicts arise
19:41:09 <clarkb> #topic Rehoming tarballs
19:41:20 <fungi> and rely on folks to ping me in irc if they need me in a particular discussion and i'm not in the right meetpad
19:41:25 <clarkb> ianw: I kept this on the agenda in case there was anything more to say about this, but I think its been taken care of?
19:43:38 <clarkb> sounds like maybe no
19:43:50 <ianw> yeah, sorry
19:43:56 <clarkb> tldr is the tarballs were moved to their proper homes and apache redirects were added for people that had old urls
19:44:13 <ianw> i have some follow-up to do on the zuul side with some questions about what needs to be published etc. i said i'd send to the list
19:44:20 <ianw> on the todo :)
19:44:24 <clarkb> roger
19:44:35 <clarkb> #topic Splitting puppet else into specific infra-prod jobs
19:45:02 <clarkb> I don't think anything has happened on this topic yet. I've been thinking about dropping it from the agenda and writing a help wanted doc for things like this
19:45:20 <clarkb> we'll see how I do with all the other stuff happening this month first
19:45:32 <clarkb> (ptg summit ansiblefest openstack release gerrit upgrade testing so many things)
19:45:41 <clarkb> #topic Bup and Borg Backups
19:46:06 <corvus> a help wanted doc would be useful (looking at the ptg list, looks like auth might be another item there?)
19:46:07 <clarkb> ianw: I don't think the borg change has landed yet but you are building a new backup server. Were you planning to make that a "normal" bup server and conver to borg later or?
19:46:12 <clarkb> corvus: ++
19:46:58 <ianw> so i have started up a new server in vexxhost
19:47:30 <ianw> i have merged the borg change with the idea to apply it to this server
19:47:46 <clarkb> nice
19:48:03 <fungi> looking at the help-wanted section of our specs index (which already includes the auth spec), the irc bot consolidation might be good to drum up interest for as well
19:48:12 <ianw> it's been not top priority but i'm making progress :)
19:48:51 <clarkb> ianw: one thing I've noticed recently with borg locally is that pip installing borg on arm64 is a bit of a pain for all the reasons we've been having python build time issues in other places
19:49:02 <clarkb> not an issue for us today as we don't need to backup and arm64 hosts but somethign to keep in mind
19:49:20 <clarkb> and the python ecosystem is slowly getting better about that in part due to your work so yay
19:50:07 <clarkb> #topic Open Discussion
19:50:14 <clarkb> Any other items to bring up today?
19:52:41 <clarkb> sounds like that may be it. Thank you everyone
19:52:58 <clarkb> Feel free to bring up discussions on the mailing list or in #opendev
19:52:59 <fungi> thanks clarkb!
19:53:03 <clarkb> #endmeeting