#opendev-meeting log

17:00:05 <corvus> #startmeeting opendev-maint
17:00:06 <openstack> Meeting started Fri Apr 10 17:00:05 2020 UTC and is due to finish in 60 minutes.  The chair is corvus. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:09 <openstack> The meeting name has been set to 'opendev_maint'
17:00:23 <corvus> ha, apparently it's opendev_maint :)
17:00:44 <corvus> #status notice etherpad.openstack.org will be offline for about 30 minutes while it is migrated to a new server with a new hostname; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html
17:00:44 <openstackstatus> corvus: sending notice
17:01:05 * mordred is in a screen on etherpad01.opendev.org
17:01:21 <corvus> joined
17:01:39 <fungi> joined as well
17:01:59 <mordred> k. I'm ready to rock and roll there - somebody else want to stop existing etherpad?
17:02:03 <mordred> (
17:02:10 * clarkb is joining
17:02:16 <corvus> i'll stop existing etherpad
17:02:20 <mordred> I'm going to warn everybody - it's like watching paint dry in the screen once this is running
17:02:40 <fungi> oh, for the db dump/source pipeline?
17:02:41 <clarkb> I've joined
17:02:42 <mordred> yup
17:03:29 <corvus> old etherpad is stopped
17:03:35 <mordred> ok. I'm going to run the command
17:03:47 <mordred> it is running
17:04:01 <corvus> neat, old etherpad is running a puppetlabs mcollectived server
17:04:03 <corvus> whatever that is
17:04:10 <openstackstatus> corvus: finished sending notice
17:04:30 <mordred> WOW
17:04:33 <corvus> mordred: is etherpad running on the new server?
17:04:38 <mordred> corvus: it shold not be
17:04:44 <mordred> I only started the mariadb service
17:04:53 <corvus> cool, i confirm that's the case :)
17:05:14 <clarkb> mcollective was puppets message bus for doing orchestration like tasks
17:05:17 <corvus> should we start the dns change now?
17:05:37 <corvus> i believe we should change etherpad.openstack.org cname to point to etherpad.opendev.org ?
17:05:45 <mordred> yeah - I think that's a good idea
17:06:08 <corvus> i'll get started on that while clarkb and fungi confirm :)
17:06:29 <clarkb> ++
17:07:43 <fungi> yes definitely
17:07:53 <fungi> to give the change time to propagate
17:08:17 <fungi> presumably the plan is to delete the existing a/aaaa rrs for etherpad.openstack.org and replace it with a cname to etherpad.opendev.org
17:09:02 <corvus> etherpad.openstack.org is currently a cname for etherpad01
17:09:07 <corvus> etherpad.openstack.org is currently a cname for etherpad01.openstack.org
17:09:17 <corvus> i was going to change it to be a cname for etherpad.opendev.org
17:09:33 <corvus> so the result will be etherpad.openstack.org -> etherpad.opendev.org -> etherpad01.opendev.org
17:09:39 <clarkb> corvus: ++
17:09:41 <mordred> corvus: I think that's correct
17:10:09 <fungi> ahh, right, so just update the cname, even easier
17:10:11 <corvus> there's just one problem; i don't see etherpad.openstack.org in the list of records in the rax web ui
17:10:20 <corvus> it was there when i changed the ttl a few days ago
17:10:36 <fungi> scroll all the way to the end and then keyword search?
17:10:37 <corvus> is there some kind of limit?
17:10:44 <mordred> the rax records are paged and sorted by type
17:10:50 <corvus> fungi: that is my usual procedure which i have done
17:10:51 <fungi> it only pages in some at a time and you have to scroll
17:10:55 <mordred> weird
17:10:57 <fungi> ahh, i can try
17:11:02 <clarkb> the lenght of the db backup is making me think about this. Whats the disk situation like on the new server? it has a 50GB volume and is currently using ~3GB of that for the prod db?
17:11:04 <mordred> also - https://review.opendev.org/#/c/718764 can be landed now
17:11:12 <corvus> wait i found it
17:11:13 <fungi> standing down!
17:11:22 <clarkb> also ^F doesn't work properly
17:11:22 <corvus> ctrl-f was not bringing it up
17:11:23 <mordred> corvus: once it's loaded it's about 30G of data
17:11:30 <mordred> gah
17:11:32 <mordred> clarkb: ^^
17:11:34 <clarkb> mordred: is 50GB big enough?
17:11:38 <corvus> but scrolling to it, it shows up (and it's highlighted)
17:11:44 <mordred> that's what the volume was on the old one
17:11:49 <clarkb> mordred: ah ok
17:12:03 <clarkb> and we can always attach another volume and grow the lv
17:12:21 <clarkb> now that I've said ^ and checked lvs I'm far less worried :)
17:12:28 <mordred> ++
17:12:44 * fungi checks paint, still sticky
17:12:49 <mordred> that said - I was totaly a shemp when I attached that volume so the lv has a stupid name
17:13:04 <corvus> #info updated etherpad.openstack.org. CNAME from etherpad01.openstack.org. to etherpad01.opendev.org.
17:13:23 <corvus> i left the ttl at 300
17:13:45 <mordred> cool
17:14:00 <corvus> do we have an ssl cert for etherpad.openstack.org on etherpad01.opendev.org?
17:14:08 <fungi> yeah, i already tested that bit
17:14:22 <corvus> cool, i thought so, just running through things again :)
17:14:28 <mordred> if you want to watch the db size grow:
17:14:30 <mordred> ls -ltrah /var/etherpad/db/etherpad@002dlite/store.ibd
17:14:36 <mordred> on etherpad01.opendev.org
17:15:22 <clarkb> ya and the LE verification failed the first time around because dns wasn't set up properly to verify that the frist time
17:15:24 <fungi> X509v3 Subject Alternative Name: DNS:etherpad.opendev.org, DNS:etherpad.openstack.org, DNS:etherpad01.opendev.org
17:15:29 <fungi> according to openssl
17:15:50 <mordred> woot
17:16:28 <corvus> etherpad.openstack.org.	300	IN	CNAME	etherpad.opendev.org.
17:16:28 <corvus> etherpad.opendev.org.	299	IN	CNAME	etherpad01.opendev.org.
17:16:28 <corvus> etherpad01.opendev.org.	218	IN	A	104.130.124.120
17:16:37 <corvus> that's what i get from dig now
17:17:10 <clarkb> corvus: looks perfect
17:17:45 <corvus> and cool, the http redirect is working
17:17:58 <corvus> (because apache is up; it's just the eplite service that's down)
17:19:27 <mordred> while we're waiting - it occurred to me recently - is having apache on the host rather than in a docker container and in the compose file the right choice? would it make more sense to run it as an apache container as well?
17:20:29 <clarkb> mordred: ya I was thinking about that back when I thought refstack might grow some momentum again. I think if we want to go away from using host networking having a host run webproxy is nice though it could be the one host network container too
17:20:59 <fungi> right, i tested the redirect yesterday as well, albeit with the etherpad service down and apache serving an error for it
17:21:12 <fungi> so looks like what i got from my local /etc/hosts edit
17:21:33 <mordred> clarkb: yeah - I was thinking about it from a "what would be different about these container services if we decided to roll out k8s"
17:21:55 <clarkb> mordred: if we rolled out k8s we'd probably use the nginx ingress controller for a good chunk of that ?
17:22:00 <corvus> i'm ambivalent about whether we run apache in a container or not; if we did, we could stull use host networking
17:22:04 <clarkb> though services like etherpad need rewriting which I don't know that can do
17:22:31 <corvus> clarkb: we would use *some kind* in ingress controller, not necessarily the nginx one, depending on what our load balancer situation was like
17:22:38 <clarkb> fair
17:22:39 <corvus> and many of them can rewrite
17:22:39 <mordred> clarkb: yeah - I think we can still run apache behind the ingress controller in those cases - so that we don't have to rewrite all of our rewrites
17:22:48 <mordred> but also - cloud load balancers are a thign too
17:23:19 <mordred> when we did the gitea setup, we used a cloud load balancer that attached to exposed service of each pod running
17:23:49 <clarkb> and that cloud load balancer was running haproxy not nginx :)
17:23:53 <mordred> that said - in our current clouds we can do the same thing only with nginx ingress if we use VRRP to manage which thing owns the VIP
17:24:19 <mordred> if we don't want to rely on a cloud load balancer
17:24:46 <mordred> I know that it's possible to create VRRP-enabled ports in neutron in vexxhost
17:25:27 <clarkb> mordred: ya the basic requirement is being able to control a shared l2 network between the instances with the 3 IPs on that network
17:26:05 <clarkb> though maybe you don't even need the third ip on that network if you can vrrp separately? its been a while since I had to do vrrp
17:26:08 <corvus> here's an ingress controller config for gke with a path mapping (to /, but the syntax is there to imagine other roots); so it's doing layer 7 load balancing -- https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/k8s/zuul.yaml#315
17:27:14 <fungi> clarkb: yeah, technically you can have vrrp/hsrp/carp use only two addresses (though a third makes it somewhat easier)
17:29:28 <mordred> corvus: so that ingress setup seems like it's mapping a single external ip to the resources?
17:30:43 <clarkb> mordred: I think its a name not an ip
17:30:50 <clarkb> (so they could do magic with dns potentially)
17:31:08 <mordred> kubernetes.io/ingress.global-static-ip-name: "zuul-static-ip"
17:31:13 <mordred> is what I was keying off of
17:31:52 <corvus> mordred: yes, it's a single pre-allocated static ip
17:32:10 <corvus> (i previously ran "gcloud get me a static ip named zuul-static-ip")
17:32:15 <clarkb> ah
17:32:21 <clarkb> so its referencing cloud resources outside of k8s
17:32:24 <mordred> nod. so pattern-wise (ignoring mechanics for a sec) - that would potentally map to the sorts of things we'd want to do
17:32:24 <corvus> yep
17:33:40 <mordred> so figuring out the equiv pattern for us inside of a k8s in openstack would be a key piece if we wanted to explore using k8s for services instead of compose
17:37:03 <clarkb> we are at 13GB used
17:41:33 <clarkb> and now 15GB this paint is sticky
17:44:02 <mordred> yeah
17:44:06 <fungi> "wet data, do not touch"
17:44:10 <mordred> seems to be running slower today
17:48:40 <fungi> it is a holiday
17:48:46 <corvus> we're expiting it to be how big?
17:48:55 <fungi> ~30gb clarkb said?
17:48:56 <corvus> 30g right?
17:49:04 <clarkb> ya thats what mordred said above
17:49:10 <fungi> oh, got it
17:49:57 <corvus> so we're 36 minutes away from completion
17:50:54 <corvus> status notice The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC
17:50:57 <corvus> should we send that?
17:51:17 <fungi> yeah, warranted
17:51:19 <clarkb> ++
17:51:22 <corvus> #status notice The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC
17:51:22 <openstackstatus> corvus: sending notice
17:51:29 <corvus> i'm going to afk for about 30m
17:52:22 <fungi> once maintenance is concluded, it may be time to prepare for my annual viewing of "the life of brian"
17:53:33 <clarkb> I'll be making a tunafish sandwich for lunch when this is done
17:53:39 <mordred> fungi, clarkb : while you're waiting: https://review.opendev.org/#/c/718764/
17:54:41 <mordred> and actually - I think we can not land that yet
17:54:48 <openstackstatus> corvus: finished sending notice
17:55:15 <mordred> and land it once we take etherpad out of the emergency file to ... no, that's too laggy. nevermind me
17:55:16 <clarkb> https://review.opendev.org/#/c/719051/ another good one to review though it had a post failure
17:55:20 <mordred> I think we can land it whenever
17:57:30 <mordred> clarkb: and this one remote:   https://review.opendev.org/719053 Set env vars pointing to correct file locations
18:01:57 <mordred> and remote:   https://review.opendev.org/719052 Fix issues from rolling out containers
18:02:10 <mordred> infra-root db migration done
18:02:21 <mordred> I might have been wrong about db size
18:02:33 <fungi> or there were a lot of zeroes at the end
18:02:39 <clarkb> or newer mysql is more compact
18:02:56 <mordred> I think actually 32G of free space on device is what I was looking at :)
18:02:57 <fungi> so ready to start up the container?
18:03:01 <mordred> yeah - I thnk so
18:03:19 <mordred> any last concerns?
18:04:03 <fungi> none for me
18:04:04 <clarkb> none from me
18:04:08 <mordred> k. here we go
18:04:41 <mordred> k. I reloaded an openstack etherpad, it redirected to opendev and all is good
18:05:04 <fungi> i reconnected to a pad i already had open and got sent to the right (new) place
18:05:09 <mordred> we might want to keep our eyes on this as it gets usage - might need to tune the my.cnf settings
18:05:32 <fungi> didn't even reload, just clicked the reconnect button from when it got disconnected during the shutdown
18:05:56 <fungi> we did at least incorporate the apache tuning we had on the old deployment, right?
18:06:27 <mordred> yeah
18:06:40 <mordred> innodb_buffer_pool_size	= 256M is the one I think might be applicable
18:06:51 <fungi> tested out a few more pads, not seeing any problem yet
18:06:54 <clarkb> mordred: thinking it may need to be bigger?
18:06:55 <mordred> but honestly, 256M of hot data isn't bad
18:07:26 <clarkb> and ya I think individual etherpads tend to be pretty small. Its the history data that grows (I wonder if we can tune it to prefer the newer pad data)
18:07:57 <mordred> it'll do that naturally - the buffer pool will only contain the most recently touched pages
18:08:20 <mordred> so I think it should be fine
18:09:09 <mordred> in other news, my new dowel-style rolling pin has arrived
18:11:30 <fungi> have fun! i still just use a boring old marble cylinder roller
18:11:41 <fungi> but i like the extra weight
18:12:27 <mordred> are you saying I'm fat?
18:13:12 <fungi> heh
18:13:35 <clarkb> that post failrue was due to an rsync failure fwiw
18:13:42 <clarkb> mordreds approval seems to have rechecked it
18:14:29 <clarkb> do we need to send an all clear now? and maybe end the meeting?
18:14:40 <clarkb> not sure what other work there is to do other than following up on gerrit jeepyb things
18:18:33 <mordred> I think we should end the meeting - don't know if we need an all clear
18:18:36 <mordred> I thnk this oe is good
18:19:09 <mordred> we might need to restart etherpad to pick up the settings.json update - but that should be a thing that can just be done - in the margin of error of an internet facing service connectivity
18:19:37 <mordred> oh - we need to take etherpad01.opendev.org out of emergency - shall I do that?
18:19:41 <clarkb> ++
18:19:58 <clarkb> and then sometime next week clean up the old server and db? probably after we have backups running for the new server?
18:20:17 <mordred> no - we need ot land ...
18:20:36 <mordred> https://review.opendev.org/#/c/719036/
18:20:39 <mordred> and then ... one sec
18:20:40 <fungi> mordred: are we missing an equivalent of https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/templates/gerrit_patchset-created.erb ?
18:21:58 <clarkb> mordred: comment on https://review.opendev.org/#/c/719036/1
18:22:22 <fungi> nevermind, found it at https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/patchset-created.j2
18:23:14 <mordred> clarkb: updated - and pushed up 2 additional
18:24:02 <mordred> fungi: oh - I issed that one in the patch didn't I?
18:24:10 <fungi> mordred: yeah, i commented
18:24:20 <fungi> since it's a template it's not in the same directory
18:24:27 <mordred> ++
18:24:35 <fungi> though maybe make it not a template?
18:24:39 <corvus> o/
18:25:18 <fungi> it's only templated so we can toggle the welcome message feature on the existence or absence of a welcome_message_gerrit_ssh_private_key value
18:25:33 <clarkb> mordred: and do we expect that to noop for review01.openstack.org? I guess since its already configured?
18:25:35 <mordred> fungi: yeah - which does't exist onreview-dev I think
18:25:42 <fungi> which i expect was more transitionalor for the benefit of people who might reuse our hook scripts
18:25:56 <mordred> clarkb: I thnik the backup group is intended to be a normal group for servers we backup?
18:26:00 <fungi> anyway, yeah, drop the conditional, move to files, add envvar exports
18:26:03 <clarkb> mordred: aha got it
18:26:06 <mordred> the backup-server is the only one we only run some times
18:26:13 <clarkb> also I accidentally adding a +W on that group change. I've removed that
18:26:14 <mordred> (see the two followup patches)
18:26:47 <mordred> fungi: no - I think review-dev doesn't have that key
18:27:03 <mordred> fungi: we'd need to add one for it - and a welcome message user
18:27:51 <mordred> that said ...
18:30:03 <mordred> fungi: I updated it - I think you'll like it now
18:32:02 <mordred> corvus: does the stack at https://review.opendev.org/#/c/719077/ look right to you?
18:33:13 <corvus> mordred: yeah -- though what was the conclusion about puppet managing backups on review?
18:33:29 <corvus> (have we confirmed that's gone?)
18:33:42 <mordred> those would be cron jobs right?
18:33:52 <clarkb> mordred: yes cron jobs
18:33:59 <clarkb> and since puppet isn't running its not managing it
18:34:12 <clarkb> would mostly just be ensuring ansible applies the same or similar cron jobs and bup config
18:34:23 <mordred> yeah. let me remove the bup cronjob
18:34:35 <mordred> there's also 2 other cronjobs we have for root we need to add to ansible
18:34:41 <mordred> but I'll leave them for now
18:34:51 <mordred> until we have the patch to replace them
18:35:23 <mordred> k. bup cronjob on review01.opendev.org has been removed - we should expect ansible to add one now
18:35:33 <mordred> lemme make a patch to add the others
18:35:58 <clarkb> service-backup should apply it
18:36:07 <clarkb> when you add the server to the backup group
18:36:18 <clarkb> (I don't know what rtiggers that playbook though)
18:40:24 <mordred> clarkb: well - we have a patch to trigger all playbooks on inventory changes
18:40:30 <mordred> that hasn't landed
18:40:39 <mordred> https://review.opendev.org/719088 <-- gerrit cron jobs
18:41:09 <mordred> clarkb: I take it back - inventory changes trigger everything now: https://review.opendev.org/71908
18:41:25 <mordred> clarkb: so adding and removing the things to groups should cause the backup playbook to run
18:41:38 <clarkb> k
18:41:55 <clarkb> mordred: that link is missing a digit
18:42:57 <mordred> clarkb:https://review.opendev.org/#/c/717114/ is what I meant
18:43:50 <clarkb> specifically line 1716 of that change covers this case
18:43:56 <mordred> yeah
18:44:55 <mordred> hah
18:47:40 <corvus> looks like it's time to end the meeting
18:47:50 <corvus> #endmeeting