#openstack-meeting log

19:01:08 <clarkb> #startmeeting infra
19:01:09 <openstack> Meeting started Tue Jan 14 19:01:08 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:12 <openstack> The meeting name has been set to 'infra'
19:01:15 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2020-January/006570.html Our Agenda
19:01:35 <clarkb> #topic Announcements
19:01:45 <clarkb> OpenStack Foundation Individual Board Member Election happening now. Ends January 17 17:00UTC
19:01:54 <diablo_rojo> o/
19:02:05 <clarkb> if you are an OSF individual member you should've received a ballot via email in the last few days
19:02:38 <fungi> also remember to keep your e-mail address up to date in your osf profile
19:02:47 <fungi> so that it doesn't go into the void
19:02:58 <fungi> (unless that's what you're after)
19:04:19 <clarkb> Also note that the end of polling is Friday
19:04:26 <clarkb> it is a quick one, so don't forget to vote if you intend to
19:05:12 <clarkb> #topic Actions from last meeting
19:05:21 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-01-07-19.01.txt minutes from last meeting
19:05:45 <clarkb> mordred had an action to deploy review-dev with ansible + gerrit
19:05:56 <clarkb> there is a change, https://review.opendev.org/#/c/702496/1 , up to do that
19:05:58 <clarkb> thank you mordred
19:06:05 <corvus> o/
19:06:18 <clarkb> I had an action item to send email to openstack-discuss re opendev split as well as push a governance change and have not yet done so
19:06:30 <clarkb> been sniped by ssl certs and nodepool things
19:06:46 <clarkb> #action clarkb push openstack governance change to pull out opendev and start discussion with tc via openstack-discuss
19:06:58 <clarkb> With that I think we can dive straight into the ansible + docker things
19:07:04 <clarkb> #topic Config Management Updates
19:07:17 <clarkb> #link https://review.opendev.org/#/c/702496/1 Manage review-dev with ansible and docker
19:07:28 <clarkb> I think we need to land this change then remove review-dev from the emergency file
19:07:53 <fungi> i just finished reviewing and approved it
19:07:56 <fungi> was a quick read
19:08:01 <fungi> thanks mordred!
19:08:44 <clarkb> Is it too early to start thinking about what this transition looks like in production?
19:08:48 <clarkb> mordred: ^
19:09:54 <fungi> yeah, i too am curious about how we get from point a to point z
19:10:03 <fungi> now that we know where point z is
19:10:24 <fungi> (or at least we think we know)
19:10:28 <corvus> (i think mordred has spotty connections today due to travel) but my understanding is that, yeah, next step is "status quo on review.o.o" but with docker
19:10:44 <corvus> or, actually, is it podman? :)
19:11:34 <fungi> can we deploy 2.13 in a docker container? or do we need to upgrade to 3.something locally on a clone of review.o.o and then copy the file tree over and plug it into a dockerized deployment?
19:11:52 <clarkb> fungi: review-dev is a 2.13 container deployment
19:11:58 <fungi> d'oh, right
19:12:07 <clarkb> and we mount the review site into the container fs so we don't even need to rebuild indexes and similar aiui
19:12:12 <corvus> so we're changing 1 variable at a time
19:12:19 <clarkb> the transition should actually be pretty quick and easy once we've got everything lined up
19:12:19 <fungi> so in theory, clone the git tree and mysqldb, take a brief outage to get those in full sync, then cut over?
19:12:42 <corvus> i think not even that due to the bind mounts
19:12:43 <clarkb> fungi: I don't even think we need to do that beyond and backup considerations
19:13:09 <fungi> ahh, do it directly on the existing review.o.o server, not a second server?
19:13:16 <corvus> yep
19:13:18 <fungi> yes, that's even faster still
19:13:51 <fungi> review-dev was a redeployment via ansible though, not a cut-over from puppet to ansible? or am i misremembering?
19:14:10 <clarkb> fungi: its a cut over
19:14:10 <corvus> i think it will mostly be 1 or 2 changes to transfer the production config from puppet to ansible (most of that was taken care of by review-dev, but dev->prod has some deltas) and removing the puppet config.
19:14:34 <corvus> it was just a really long cutover, mostly spent in the emergency file, as mordred figured out what to do one bit at a time.
19:15:14 <fungi> okay, yep, i misremembered then. /etc/hostname on review-dev01 is ~10 months old which is well before the dockerization work started for it
19:15:40 <corvus> but honestly, i think we're looking at something like mordred preparing the change, and then we clear a friday the next time he's around and just do it.
19:15:49 <clarkb> corvus: ++
19:15:59 <clarkb> I really do expect the total outage time to not be much longer than a normal gerrit restart
19:16:02 <fungi> sounds good to me (as long as it's not friday the 31st)
19:16:15 <clarkb> its essentially stopping the puppet managed sysv init version and starting the container version
19:16:25 <corvus> yep.  so i think we can risk a friday for that.
19:16:29 <clarkb> oh that makes me wonder if we properly cleaned up the sysv init script on review-dev
19:17:06 <corvus> it appears to be in place and symlinked
19:17:06 <clarkb> I don't think the puppet changes would've done that. I'm happy for that to be manual cleanup too
19:17:28 <clarkb> mordred: ^ if you read scrollback that may be an important piece of the transition that has been missed (so that we properly handle reboots)
19:18:31 <corvus> i'll cleanup review-dev now
19:18:34 <clarkb> thanks
19:19:01 <clarkb> Anything else on this subject or should we continue?
19:19:18 <fungi> just excited to see this come to fruition
19:20:06 <clarkb> #topic OpenDev
19:20:18 <clarkb> as mentioned the email and governance change has not yet gone out
19:20:43 <clarkb> it is high on my list but I've had operational items to look at as well as paperwork stuff (yay beginning of the year) to look at too
19:20:49 <clarkb> I'll try to get that done this week
19:21:12 <fungi> i've got a change up to reorganize the opendev.org dns zonefile. it's got 2x +2 but i didn't want to self-approve without some visibility
19:21:13 <clarkb> The other item I had on this topic was we noticed that we miss gerrit replication events when restarting gitea
19:21:32 <fungi> ahh, yeah, that's a problem
19:21:32 <clarkb> fungi: has it incorporated the latest LE changes?
19:21:43 <clarkb> fungi: I'm fine with self approving if its up to date
19:21:45 <fungi> clarkb: i think i based it on top of your zone changes
19:22:02 <fungi> ahh, nope, there have been more since
19:22:04 <clarkb> fungi: ok, the insecure-ci-registry change came a little later, not sure if it was accomodated
19:22:05 <fungi> i'll rebase
19:22:07 <clarkb> (sorry)
19:22:16 <fungi> no worries
19:22:31 <fungi> #link https://review.opendev.org/701833 [opendev/zone-opendev.org] Reorganize zonefile
19:22:34 <fungi> for reference
19:22:56 <clarkb> for the gitea and gerrit replication thing I've been tossing that around in the back of my mind a bit and haven't really come up with a great solution
19:23:06 <corvus> let's do that soon, because i have to imagine it'll be easy to miss a name in a rebase
19:23:09 <clarkb> part of the problem here is gerrit does not see any errors so from its perspective replication has succeeded
19:23:12 <clarkb> corvus: ++
19:23:22 <fungi> the not-so-great sledgehammer solution is to force full replication immediately following gitea restarts
19:23:38 <clarkb> fungi: ya another idea I had was to run periodic full replications
19:23:55 <clarkb> another was to replicate behind gitea's back
19:24:03 <corvus> do we think there may be a gerrit bug wrt to the "not noticing" bit?
19:24:14 <clarkb> corvus: I think it is a gitea issue
19:24:28 <fungi> it's possible either one is at fault, we could test
19:24:29 <corvus> how can that be?
19:24:30 <clarkb> corvus: since we saw similar when the gitconfig lock file was held, gerrit would push, gitea would say ok, but in reality nothign on disk was modified
19:24:54 <clarkb> but ya possible it is gerrit's fault (or jgit)
19:24:57 <fungi> either gerrit is ignoring push errors or gitea is silently ignoring the pushes/not writing them
19:25:39 <corvus> i assumed this was some sort of connection refused situation, but you think there's a connection and it's succeeding?
19:26:13 <fungi> it persisted for over a minute after a container restart (not sure how long exactly)
19:26:28 <corvus> right,  but gitea does not start immediately
19:26:36 <clarkb> corvus: ya and largely based on the gitconfig lock behavior
19:26:55 <fungi> yeah, it's entirely possible there was no open socket, like i said, probably worth testing
19:27:44 <corvus> it sounds like we don't have much evidence at the moment.  it would probably be worth getting to the bottom of this behavior, but maybe not until after the gerrit upgrade, since there are like 5 years of bugfixes (and new bugs) in there.
19:28:04 <fungi> i've seen gerrit retry pushes for replication in the past when there are errors (often for a very long time) so i suspect it's not a typical remote outage
19:28:04 <clarkb> 'gerrit.sshConnectionTimeout : Timeout for SSH connections. If 0, there is no timeout and the client waits indefinitely. By default, 2 minutes.' is a thing fwiw
19:28:16 <clarkb> if it does come down to connection issues we may be able to work around it via config settings
19:28:34 <clarkb> corvus: ++ to not trying to solve it for old gerrit
19:28:36 <corvus> so maybe the sledgehammer approach of a full replication after restarts for now?
19:28:51 <clarkb> because ya if it is a bug in the relpication plugin there is a good chance it will be fixed as soon as we update
19:29:09 <fungi> i'm fine with this plan
19:30:49 <clarkb> actually I wonder if we didn't restart the ssh container if that would change anything
19:31:03 <clarkb> corvus: ^ the sshd container is just running opensshd on a non standard port right?
19:31:09 <clarkb> we don't have to update it when we update gitea?
19:31:39 <corvus> clarkb: yes but -- it executes a gitea command
19:31:59 <corvus> ssh forced command
19:32:09 <clarkb> ah
19:32:43 <corvus> could be that sequencing those in some way might help
19:32:52 <corvus> like shut them both down, then start web, wait 1 minute, start ssh.
19:33:14 <corvus> (otoh, would probably be better to just update error handling in the ssh component if that's the problem)
19:33:36 <clarkb> #agreed Need further debugging on Gerrit replication issues when Gitea is restarted. If Gerrit side issue then work on upgrading Gerrit. If Gitea problem we can sort it out from there.
19:33:38 <clarkb> how is ^
19:34:10 <corvus> oh i was thinking we don't do any more debugging until after upgrading gerrit.  unless someone is bored of course.
19:34:12 <fungi> i'd also be fine just forcing replication until the gerrit upgrade
19:34:15 <fungi> yeah, that
19:34:22 <clarkb> #undo
19:34:23 <openstack> Removing item from minutes: #agreed Need further debugging on Gerrit replication issues when Gitea is restarted. If Gerrit side issue then work on upgrading Gerrit. If Gitea problem we can sort it out from there.
19:34:58 <clarkb> #agreed Wait for Gerrit upgrades before digging into replication issues to gitea when gitea is restarted as new Gerrit may fix. In the mean time we can brute force replication post Gitea updates.
19:35:03 <mordred> o/
19:35:05 <mordred> yay internet
19:35:49 <clarkb> Anything else on OpenDev or should we move on?
19:36:04 * mordred agrees with scrollback about gerrit
19:36:11 <corvus> mordred: you may want to review 19:08-19:15
19:36:53 <mordred> corvus: ++
19:37:13 <clarkb> #topic General Topics
19:37:23 <clarkb> Static.openstack.org migration
19:37:32 <clarkb> #link https://review.opendev.org/#/q/status:open+topic:static.opendev.org
19:37:38 <clarkb> These changes need reviews.
19:37:44 <clarkb> ianw: it looks like one of them is merge conflicting now
19:38:13 <clarkb> otherwise they looked good to me, though I did leave a few coments you might want to look at if rebasing for the merge conflict
19:38:52 <ianw> ok, will cycle back
19:38:53 <ianw> thanks
19:38:55 <clarkb> I did want to call out that zuul-ci.org has delegated its acme stuff to acme.opendev.org similar to how we plan to do for openstack.org domains
19:38:59 <clarkb> and that is all working as expected
19:39:43 <corvus> thanks again to ianw for building a solid and versatile system :)
19:39:57 <fungi> yeah, this has been working out great
19:40:29 <corvus> and to mnaser for reminding us we can use it that way :)
19:40:43 <fungi> we'll likely have a lot more churn over the next year as old certs expire and we want to not have to pay to renew them
19:40:51 <clarkb> We should try our bnest to add any newly SSL'd domains to the certcheck list as we add them
19:41:06 <fungi> (and more importantly, not have to worry about manually swapping them for new certs the following year)
19:41:43 <clarkb> I did take a note yesterday to make sure we do systemctl reload apache2 equivalents in the restart_apache LE handler
19:41:48 <clarkb> I should get to that this afternoon
19:41:54 <clarkb> fungi: ^ unless you already did that
19:42:46 <fungi> i haven't looked to see what the service unit calls, no. ideally it's something like apache2ctl graceful, which sends sigusr1 to the parent process
19:43:11 <clarkb> k I've got it on my list as cleanup from cert updates as well as updating hiera/hostvars now that we've stopped using some old certs stored there
19:43:16 <fungi> (same thing the reload action in the initscript does)
19:44:04 <clarkb> Next up is the wiki
19:44:14 <clarkb> fungi: I don't expect there has been much movement on that in the last week?
19:44:45 * fungi consults his magic 8-ball
19:44:47 <fungi> "don't count on it."
19:44:58 <clarkb> k. That takes us to the end of our regularly scheduled agenda
19:45:04 <clarkb> #topic Open Discussion
19:45:33 <clarkb> Setuptools 45 is not compatible with python2
19:45:54 <clarkb> this would've been unnoticed except it still builds a python2 capable wheel (according to the filename) and our wheel mirror served that
19:46:15 <clarkb> ianw: thank you for working on adding the data-requires-python attributes to our package indexes on the wheel mirror
19:46:34 <clarkb> fungi: got us to an intermediate fix via removal of setuptools 45 wheel on our mirror
19:46:57 <ianw> yeah, the idea of not making wheels available on pypi is still a todo but worthwhile too probably
19:47:13 <ianw> i mean, us not building wheels that are already available on pypi, to be clearer
19:47:29 <clarkb> yup, it will reduce disk requirements and speed up the jobs that build the wheels we do need
19:47:35 <clarkb> should be a good improvement
19:47:51 <ianw> and i also started looking at getting centos8 in there for wheel builds, and a request for buster builds too
19:48:18 <clarkb> In general we should probably expect more fallout from python2 EOL
19:48:31 <clarkb> though probably nothing as impactful as setuptools updating
19:50:10 <ianw> fyi on friday i am moving from sydney to melbourne ... same TZ so not much change there
19:50:48 <clarkb> ianw: does the air quality get better or worse as a result of that?
19:51:36 <ianw> umm, right now worse, maybe :/
19:52:15 <fungi> that's quite the cross-country haul
19:52:41 <fungi> but at least you'll be a short boat ride from jhesketh
19:53:02 <clarkb> for some values of short
19:53:56 <ianw> yes, will be fun to explore a new part of .au
19:54:20 <fungi> good luck with the move!
19:54:29 <ianw> 2 years ... maybe ... but who knows!  thanks
19:55:12 <clarkb> yes, I hope it goes well
19:55:19 <clarkb> I think that may be it for the meeting
19:55:22 <clarkb> Thank you everyone
19:55:25 <clarkb> See you next week
19:55:27 <clarkb> #endmeeting