19:01:08 #startmeeting infra 19:01:09 Meeting started Tue Jan 14 19:01:08 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:12 The meeting name has been set to 'infra' 19:01:15 #link http://lists.openstack.org/pipermail/openstack-infra/2020-January/006570.html Our Agenda 19:01:35 #topic Announcements 19:01:45 OpenStack Foundation Individual Board Member Election happening now. Ends January 17 17:00UTC 19:01:54 o/ 19:02:05 if you are an OSF individual member you should've received a ballot via email in the last few days 19:02:38 also remember to keep your e-mail address up to date in your osf profile 19:02:47 so that it doesn't go into the void 19:02:58 (unless that's what you're after) 19:04:19 Also note that the end of polling is Friday 19:04:26 it is a quick one, so don't forget to vote if you intend to 19:05:12 #topic Actions from last meeting 19:05:21 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-01-07-19.01.txt minutes from last meeting 19:05:45 mordred had an action to deploy review-dev with ansible + gerrit 19:05:56 there is a change, https://review.opendev.org/#/c/702496/1 , up to do that 19:05:58 thank you mordred 19:06:05 o/ 19:06:18 I had an action item to send email to openstack-discuss re opendev split as well as push a governance change and have not yet done so 19:06:30 been sniped by ssl certs and nodepool things 19:06:46 #action clarkb push openstack governance change to pull out opendev and start discussion with tc via openstack-discuss 19:06:58 With that I think we can dive straight into the ansible + docker things 19:07:04 #topic Config Management Updates 19:07:17 #link https://review.opendev.org/#/c/702496/1 Manage review-dev with ansible and docker 19:07:28 I think we need to land this change then remove review-dev from the emergency file 19:07:53 i just finished reviewing and approved it 19:07:56 was a quick read 19:08:01 thanks mordred! 19:08:44 Is it too early to start thinking about what this transition looks like in production? 19:08:48 mordred: ^ 19:09:54 yeah, i too am curious about how we get from point a to point z 19:10:03 now that we know where point z is 19:10:24 (or at least we think we know) 19:10:28 (i think mordred has spotty connections today due to travel) but my understanding is that, yeah, next step is "status quo on review.o.o" but with docker 19:10:44 or, actually, is it podman? :) 19:11:34 can we deploy 2.13 in a docker container? or do we need to upgrade to 3.something locally on a clone of review.o.o and then copy the file tree over and plug it into a dockerized deployment? 19:11:52 fungi: review-dev is a 2.13 container deployment 19:11:58 d'oh, right 19:12:07 and we mount the review site into the container fs so we don't even need to rebuild indexes and similar aiui 19:12:12 so we're changing 1 variable at a time 19:12:19 the transition should actually be pretty quick and easy once we've got everything lined up 19:12:19 so in theory, clone the git tree and mysqldb, take a brief outage to get those in full sync, then cut over? 19:12:42 i think not even that due to the bind mounts 19:12:43 fungi: I don't even think we need to do that beyond and backup considerations 19:13:09 ahh, do it directly on the existing review.o.o server, not a second server? 19:13:16 yep 19:13:18 yes, that's even faster still 19:13:51 review-dev was a redeployment via ansible though, not a cut-over from puppet to ansible? or am i misremembering? 19:14:10 fungi: its a cut over 19:14:10 i think it will mostly be 1 or 2 changes to transfer the production config from puppet to ansible (most of that was taken care of by review-dev, but dev->prod has some deltas) and removing the puppet config. 19:14:34 it was just a really long cutover, mostly spent in the emergency file, as mordred figured out what to do one bit at a time. 19:15:14 okay, yep, i misremembered then. /etc/hostname on review-dev01 is ~10 months old which is well before the dockerization work started for it 19:15:40 but honestly, i think we're looking at something like mordred preparing the change, and then we clear a friday the next time he's around and just do it. 19:15:49 corvus: ++ 19:15:59 I really do expect the total outage time to not be much longer than a normal gerrit restart 19:16:02 sounds good to me (as long as it's not friday the 31st) 19:16:15 its essentially stopping the puppet managed sysv init version and starting the container version 19:16:25 yep. so i think we can risk a friday for that. 19:16:29 oh that makes me wonder if we properly cleaned up the sysv init script on review-dev 19:17:06 it appears to be in place and symlinked 19:17:06 I don't think the puppet changes would've done that. I'm happy for that to be manual cleanup too 19:17:28 mordred: ^ if you read scrollback that may be an important piece of the transition that has been missed (so that we properly handle reboots) 19:18:31 i'll cleanup review-dev now 19:18:34 thanks 19:19:01 Anything else on this subject or should we continue? 19:19:18 just excited to see this come to fruition 19:20:06 #topic OpenDev 19:20:18 as mentioned the email and governance change has not yet gone out 19:20:43 it is high on my list but I've had operational items to look at as well as paperwork stuff (yay beginning of the year) to look at too 19:20:49 I'll try to get that done this week 19:21:12 i've got a change up to reorganize the opendev.org dns zonefile. it's got 2x +2 but i didn't want to self-approve without some visibility 19:21:13 The other item I had on this topic was we noticed that we miss gerrit replication events when restarting gitea 19:21:32 ahh, yeah, that's a problem 19:21:32 fungi: has it incorporated the latest LE changes? 19:21:43 fungi: I'm fine with self approving if its up to date 19:21:45 clarkb: i think i based it on top of your zone changes 19:22:02 ahh, nope, there have been more since 19:22:04 fungi: ok, the insecure-ci-registry change came a little later, not sure if it was accomodated 19:22:05 i'll rebase 19:22:07 (sorry) 19:22:16 no worries 19:22:31 #link https://review.opendev.org/701833 [opendev/zone-opendev.org] Reorganize zonefile 19:22:34 for reference 19:22:56 for the gitea and gerrit replication thing I've been tossing that around in the back of my mind a bit and haven't really come up with a great solution 19:23:06 let's do that soon, because i have to imagine it'll be easy to miss a name in a rebase 19:23:09 part of the problem here is gerrit does not see any errors so from its perspective replication has succeeded 19:23:12 corvus: ++ 19:23:22 the not-so-great sledgehammer solution is to force full replication immediately following gitea restarts 19:23:38 fungi: ya another idea I had was to run periodic full replications 19:23:55 another was to replicate behind gitea's back 19:24:03 do we think there may be a gerrit bug wrt to the "not noticing" bit? 19:24:14 corvus: I think it is a gitea issue 19:24:28 it's possible either one is at fault, we could test 19:24:29 how can that be? 19:24:30 corvus: since we saw similar when the gitconfig lock file was held, gerrit would push, gitea would say ok, but in reality nothign on disk was modified 19:24:54 but ya possible it is gerrit's fault (or jgit) 19:24:57 either gerrit is ignoring push errors or gitea is silently ignoring the pushes/not writing them 19:25:39 i assumed this was some sort of connection refused situation, but you think there's a connection and it's succeeding? 19:26:13 it persisted for over a minute after a container restart (not sure how long exactly) 19:26:28 right, but gitea does not start immediately 19:26:36 corvus: ya and largely based on the gitconfig lock behavior 19:26:55 yeah, it's entirely possible there was no open socket, like i said, probably worth testing 19:27:44 it sounds like we don't have much evidence at the moment. it would probably be worth getting to the bottom of this behavior, but maybe not until after the gerrit upgrade, since there are like 5 years of bugfixes (and new bugs) in there. 19:28:04 i've seen gerrit retry pushes for replication in the past when there are errors (often for a very long time) so i suspect it's not a typical remote outage 19:28:04 'gerrit.sshConnectionTimeout : Timeout for SSH connections. If 0, there is no timeout and the client waits indefinitely. By default, 2 minutes.' is a thing fwiw 19:28:16 if it does come down to connection issues we may be able to work around it via config settings 19:28:34 corvus: ++ to not trying to solve it for old gerrit 19:28:36 so maybe the sledgehammer approach of a full replication after restarts for now? 19:28:51 because ya if it is a bug in the relpication plugin there is a good chance it will be fixed as soon as we update 19:29:09 i'm fine with this plan 19:30:49 actually I wonder if we didn't restart the ssh container if that would change anything 19:31:03 corvus: ^ the sshd container is just running opensshd on a non standard port right? 19:31:09 we don't have to update it when we update gitea? 19:31:39 clarkb: yes but -- it executes a gitea command 19:31:59 ssh forced command 19:32:09 ah 19:32:43 could be that sequencing those in some way might help 19:32:52 like shut them both down, then start web, wait 1 minute, start ssh. 19:33:14 (otoh, would probably be better to just update error handling in the ssh component if that's the problem) 19:33:36 #agreed Need further debugging on Gerrit replication issues when Gitea is restarted. If Gerrit side issue then work on upgrading Gerrit. If Gitea problem we can sort it out from there. 19:33:38 how is ^ 19:34:10 oh i was thinking we don't do any more debugging until after upgrading gerrit. unless someone is bored of course. 19:34:12 i'd also be fine just forcing replication until the gerrit upgrade 19:34:15 yeah, that 19:34:22 #undo 19:34:23 Removing item from minutes: #agreed Need further debugging on Gerrit replication issues when Gitea is restarted. If Gerrit side issue then work on upgrading Gerrit. If Gitea problem we can sort it out from there. 19:34:58 #agreed Wait for Gerrit upgrades before digging into replication issues to gitea when gitea is restarted as new Gerrit may fix. In the mean time we can brute force replication post Gitea updates. 19:35:03 o/ 19:35:05 yay internet 19:35:49 Anything else on OpenDev or should we move on? 19:36:04 * mordred agrees with scrollback about gerrit 19:36:11 mordred: you may want to review 19:08-19:15 19:36:53 corvus: ++ 19:37:13 #topic General Topics 19:37:23 Static.openstack.org migration 19:37:32 #link https://review.opendev.org/#/q/status:open+topic:static.opendev.org 19:37:38 These changes need reviews. 19:37:44 ianw: it looks like one of them is merge conflicting now 19:38:13 otherwise they looked good to me, though I did leave a few coments you might want to look at if rebasing for the merge conflict 19:38:52 ok, will cycle back 19:38:53 thanks 19:38:55 I did want to call out that zuul-ci.org has delegated its acme stuff to acme.opendev.org similar to how we plan to do for openstack.org domains 19:38:59 and that is all working as expected 19:39:43 thanks again to ianw for building a solid and versatile system :) 19:39:57 yeah, this has been working out great 19:40:29 and to mnaser for reminding us we can use it that way :) 19:40:43 we'll likely have a lot more churn over the next year as old certs expire and we want to not have to pay to renew them 19:40:51 We should try our bnest to add any newly SSL'd domains to the certcheck list as we add them 19:41:06 (and more importantly, not have to worry about manually swapping them for new certs the following year) 19:41:43 I did take a note yesterday to make sure we do systemctl reload apache2 equivalents in the restart_apache LE handler 19:41:48 I should get to that this afternoon 19:41:54 fungi: ^ unless you already did that 19:42:46 i haven't looked to see what the service unit calls, no. ideally it's something like apache2ctl graceful, which sends sigusr1 to the parent process 19:43:11 k I've got it on my list as cleanup from cert updates as well as updating hiera/hostvars now that we've stopped using some old certs stored there 19:43:16 (same thing the reload action in the initscript does) 19:44:04 Next up is the wiki 19:44:14 fungi: I don't expect there has been much movement on that in the last week? 19:44:45 * fungi consults his magic 8-ball 19:44:47 "don't count on it." 19:44:58 k. That takes us to the end of our regularly scheduled agenda 19:45:04 #topic Open Discussion 19:45:33 Setuptools 45 is not compatible with python2 19:45:54 this would've been unnoticed except it still builds a python2 capable wheel (according to the filename) and our wheel mirror served that 19:46:15 ianw: thank you for working on adding the data-requires-python attributes to our package indexes on the wheel mirror 19:46:34 fungi: got us to an intermediate fix via removal of setuptools 45 wheel on our mirror 19:46:57 yeah, the idea of not making wheels available on pypi is still a todo but worthwhile too probably 19:47:13 i mean, us not building wheels that are already available on pypi, to be clearer 19:47:29 yup, it will reduce disk requirements and speed up the jobs that build the wheels we do need 19:47:35 should be a good improvement 19:47:51 and i also started looking at getting centos8 in there for wheel builds, and a request for buster builds too 19:48:18 In general we should probably expect more fallout from python2 EOL 19:48:31 though probably nothing as impactful as setuptools updating 19:50:10 fyi on friday i am moving from sydney to melbourne ... same TZ so not much change there 19:50:48 ianw: does the air quality get better or worse as a result of that? 19:51:36 umm, right now worse, maybe :/ 19:52:15 that's quite the cross-country haul 19:52:41 but at least you'll be a short boat ride from jhesketh 19:53:02 for some values of short 19:53:56 yes, will be fun to explore a new part of .au 19:54:20 good luck with the move! 19:54:29 2 years ... maybe ... but who knows! thanks 19:55:12 yes, I hope it goes well 19:55:19 I think that may be it for the meeting 19:55:22 Thank you everyone 19:55:25 See you next week 19:55:27 #endmeeting