19:01:08 <clarkb> #startmeeting infra
19:01:09 <openstack> Meeting started Tue Feb 19 19:01:08 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:12 <openstack> The meeting name has been set to 'infra'
19:01:19 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:01:23 <gary_perkins> o/ Hi!
19:01:32 <ianw> o/
19:02:09 <clarkb> I seem to remember scribbling a note to link to the mailing list threads when I send those out instead /me scribbles note bigger to do that
19:02:22 <clarkb> #topic Announcements
19:02:47 <clarkb> I have nothing to announce. Any announcements others would like to call out?
19:03:34 <fungi> nothing springs to mind
19:04:07 <clarkb> I guess I should note the openstack TC election nomination period is now and ends shortly
19:04:22 <clarkb> so get your self nomination pushed if that is something you would like to do
19:04:24 <fungi> ~4.75 hours remaining
19:04:55 <clarkb> Then be on the lookout for ballots in the near future
19:05:12 <clarkb> #topic Actions from last meeting
19:05:20 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-02-12-19.01.txt minutes from last meeting
19:05:48 <clarkb> We didn't call this out as an action in the meeting notes but ianw took the action of updating the letsencrypt spec to consider dns based acme validation. This has been done and is the subject of our next topic
19:06:07 <clarkb> #topic Specs approval
19:06:13 <clarkb> #link https://review.openstack.org/#/c/587283/ LetsEncrypt Spec now updated with DNS ACME
19:06:26 <clarkb> I don't think this is ready for approval based on the good conversations this has started
19:06:58 <clarkb> It does seem we are making good progress through the conversation though. So please take a look if this interests you at all
19:07:16 <clarkb> and thank you ianw for getting that put together with some poc ansible work to show how it might actually run in production
19:07:17 <fungi> it seems to be zeroing in on a more concrete design
19:07:52 <ianw> yes i've updated from comments yesterday, a small changelog is in the notes for ps14
19:08:22 <ianw> i also updated the POC to provide self-signed certs in a testing environment
19:08:48 <ianw> (one thought for future development is that we could run our own LE, containerised)
19:09:04 <ianw> future use in CI I mean
19:09:49 <clarkb> ianw: as far as next steps go it seems like we are nearing where we need to start making some decisions and moving forward? do we think we might be able to put it up for approval next week? if we can get through another round of reviews and post a new ps with some decisions?
19:09:49 <ianw> and responded to corvus' thoughts on breaking up base.yaml in https://review.openstack.org/637645
19:10:12 <clarkb> corvus: fungi ^ you've reviewed it recently too so your thoughts on how quickly we might be able to come to consensus is appreciated
19:10:56 <ianw> clarkb: yes, i think so; i'll commit to responding to any review comments quickly so we can get things going
19:10:57 <corvus> i think i'm generally in favor of what's written.
19:11:15 <corvus> i would love one more update that said "we aren't going to run anything that touches proprietary rax dns"
19:11:53 <fungi> i haven't looked at the diff from the most recent patchset yet
19:11:59 <fungi> skimming now
19:12:06 <clarkb> as the purchasers of openstack.org certs for our services I'm ok with corvus' request and doing one last bulk purchase round
19:12:23 <clarkb> if that helps swing any opinions
19:12:25 <corvus> but i'm at least happy enough that we're all on the same page that it's at least not something we'll use for any new services and will only shrink, that i can go along with what's written
19:13:22 <clarkb> alright anything else on this before we move on? (thanks again to everyone that helped push this forward, ianw in particular)
19:13:26 <ianw> my feeling is that it really is quite a small change, and we can have certificates for anything under openstack.org which is something that we've talked about for a long time
19:13:47 <fungi> having previously obtained the certs for openstack.org for a number of years too, i agree that generating them all at once for a renewal is less painful than adding new ones for services we add over time (and new services should be going into opendev.org anyway)
19:14:47 <fungi> also, moving our existing http-only services to opendev.org before switching them to https shouldn't pose significant burden
19:15:40 <clarkb> seems like we should be able to sort that out in review. Please do review the spec. But I think we need to move to our next topic(s)
19:15:51 <ianw> ++
19:15:53 <clarkb> #topic Priority Efforts
19:15:59 <clarkb> #topic Storyboard
19:16:09 <clarkb> fungi you have good news for us on this topic iirc
19:16:29 <clarkb> Storyboard prod and dev servers are now both running on xenial with local mysql DBs right?
19:16:53 <clarkb> last remaining cleanups are to configure backups?
19:17:16 <fungi> oh, well we replaced the trusty server with a xenial deployment in production (and upped the flavor help with performance), plus moved the database out of trove and onto the server itself
19:17:38 <mordred> yay xenial!
19:18:01 <fungi> #link https://review.openstack.org/637388 Clean up after StoryBoard replacement maintenance
19:18:05 <SotK> \o/
19:18:10 <prometheanfire> good for another 2 years iirc (EOL)?
19:18:10 <fungi> that's the cleanup change
19:18:27 <fungi> and yeah, i still need to init the remote backups
19:18:29 <clarkb> prometheanfire: yup
19:19:03 <fungi> also clarkb spotted that karbor has asked to move to storyboard
19:19:10 <fungi> #link https://review.openstack.org/636574 Update karbor to use storyboard
19:19:57 <fungi> and cinder is adding a new repo to sb to test it out too
19:20:08 <fungi> #link https://review.openstack.org/637613 Add cinderlib project
19:20:15 <clarkb> cinderlib a library to consume cinder drivers without an api service
19:20:48 <fungi> yeah, i get the impression that'll be handy for some container orchestration system use cases
19:21:02 <smcginnis> oVirt is also adopting it.
19:21:10 <fungi> neat!
19:21:21 <clarkb> anything else to be aware of or help with in storyboard land?
19:22:02 <SotK> If anyone has spare time to look at the attachments patches it'd be much appreciated :)
19:22:48 <clarkb> #link https://review.openstack.org/#/q/status:open+topic:story-attachments storyboard file attachment changes
19:23:41 <fungi> nothing else on my side
19:23:48 <diablo_rojo_phon> Mine either
19:23:53 <clarkb> #topic Configuration Management Updates
19:24:21 <clarkb> Now that I've got pbx off my plate I intend to pick up the puppet 4 upgrades and futureparser changes again.
19:24:53 <clarkb> Also worht noting (beacuse some had missed it) that we are now running a service with ansible + docker and no puppet. This service is the insecure image registry that will be used for zuul jobs that build docker images
19:25:38 <fungi> i had forgotten after reviewing many (most?) of those changes even
19:26:41 <clarkb> I think work here has slowed beacuse its starting to become our new normal. Though still more work to be done with puppet upgrades and other service flips to docker.
19:26:48 <clarkb> Are there any other items we need to call out on this topic?
19:27:26 <corvus> i'm planning on having zuul-preview deployed that way too
19:27:39 <corvus> i think it's easy and fast :)
19:27:52 <clarkb> ya once the underlying tooling is in place it seems to be pretty straightforard
19:28:05 <fungi> and sounds like the plan for now is to also do gitea that way?
19:28:07 <clarkb> the big bootstrapping need was the image building wheel and we have that now
19:28:13 <clarkb> fungi: thats our next topic :)
19:28:26 * fungi returns to the present in that case
19:28:28 <clarkb> So maybe time to move on to that
19:28:42 <clarkb> #topic OpenDev
19:29:09 <clarkb> Last we met at this venue corvus expected we'd have the gitea cluster in a production like state ready for us to really start interacting with it
19:29:21 <corvus> reality happened
19:29:39 <clarkb> Then we did a couple gitea upgrades and discovered that gitea had never intended for shared everything gitea clusters to "work" and they fixed it
19:29:55 <clarkb> long story short you kind of need to run a single gitea process today the way it is built
19:30:07 <corvus> #link gitea bug for shared indexes https://github.com/go-gitea/gitea/issues/5798
19:30:18 <corvus> to be fair it's the search engine that has this problem
19:30:32 <corvus> but the search engine is a major selling point
19:30:45 <clarkb> so after a discussion in #opendev earlier today I think we are leaning towards running multiple shared nothing gitea instances similar to how we run multiple shared nothing cgit instances today
19:31:07 <fungi> which does mean independently replicating to them for now
19:31:37 <mordred> ++
19:31:40 <clarkb> And longer term it sounds like they'll add support for elasticsearch backed search indexes for the code which will allow us to go back to the shared everything cluster model in k8s
19:32:20 <fungi> also, i think we didn't touch on this, but we'll likely need some manner of client affinity in the load balancing layer to deal with replication being out of sync between cluster members
19:32:47 <fungi> (if memory serves we do this in the cgit haproxy instance today?)
19:32:50 <corvus> i'd like to keep the k8s cluster up and running to continue work in parallel on deployment and testing tooling
19:33:11 <corvus> fungi: yeah, i expect us to keep exactly the haproxy config we have now with docker-compose gitea.
19:33:12 <clarkb> fungi: we switched to least connections backend with haproxy but can go back to client affinity based on ip
19:33:25 <clarkb> iirc
19:33:31 <mordred> corvus: basically just switching the backend ips really, right?
19:33:36 <corvus> mordred: yep
19:33:40 <mordred> \o/
19:34:01 <fungi> clarkb: oh, did we not actually end up with any git fetch errors from dropping the client affinity?
19:34:11 <fungi> in that case it's probably safe to do the same with gitea
19:34:11 <corvus> though, maybe it's more complex because we're also changing hostnames... so... i dunno, maybe it's a duplicate haproxy.
19:34:33 <corvus> and actually...
19:34:43 <corvus> do we want to host the docker gitea instances in vexxhost?
19:34:51 <corvus> (that's where we put the k8s cluster)
19:34:57 <fungi> is there a reason not to?
19:35:22 <corvus> only reason i can think of is ipv6, which should be fixed before it matters to us.
19:35:40 <corvus> but if we do, then we definitely want a new lb, since that should be in the same dc as the backends :)
19:36:05 <fungi> have we gotten working reverse dns for v6 addresses in vexxhost yet?
19:36:27 <corvus> fungi: we don't have v6 in vexxhost sjc1 yet.  but it should be there RSN.
19:36:32 <fungi> ahh
19:36:43 <clarkb> corvus: what was the reason for using sjc1 over montreal?
19:36:56 <corvus> well, originally, because magnum was newer there :)
19:37:06 <clarkb> there is working ipv6 in the other region today
19:37:40 <corvus> we could ask mnaser where he'd like us to put them :)
19:37:43 <clarkb> ++
19:37:56 <corvus> either way, it'll be a new load balancer.
19:38:09 <clarkb> Other news in opendev land is we've started booting replacement servers with opendev dns names and CNAMEing openstack records to them
19:38:22 <clarkb> I updated the launch docs with some short notes on how this changes our server laucnh processes
19:38:37 <clarkb> please do propose updates to that little README if there are inconsistencies with reality
19:39:18 <mordred> sweet
19:39:42 <fungi> yeah, the storyboard servers are this way now
19:39:53 <clarkb> as is the new pbx
19:40:12 <clarkb> Anything else on opendev?
19:41:21 <clarkb> #topic General Topics
19:41:39 <clarkb> First up frickler points out that I have been bad with emacs and gpg agent and the secrets file
19:41:48 <prometheanfire> do we know why image sizes jumped? https://nb01.openstack.org/images/
19:42:29 <corvus> me too
19:42:40 <clarkb> prometheanfire: we can follow up after the agena'd topics
19:42:48 <frickler> clarkb: I was also wondering whether we somehow automating this to be safer. like having a cronjob that checks for running gpg-agent
19:43:01 <corvus> we should write a /usr/local/bin script
19:43:06 <clarkb> frickler: or maybe create a small script to use to run emacs for that file
19:43:06 <corvus> editpw or something
19:43:07 <prometheanfire> clarkb: k
19:43:08 <clarkb> corvus: jinx
19:43:11 <mordred> I have also been bad at emacs
19:43:20 <mordred> clarkb, corvus: ++
19:43:21 <clarkb> corvus: I think that solution will be easy for me to remember once i use it a couple times
19:43:27 <clarkb> frickler: ^ what do you think?
19:43:36 <frickler> sounds good, yes
19:43:38 <corvus> yeah.  there's no way i'll remember "gpg-agent --daemon emacs"
19:43:59 <ianw> was this an email?  i've missed this
19:44:00 <corvus> no one think about that too much.
19:44:08 <ianw> re: gpg agent
19:44:28 <fungi> can we configure gpg-agent to run on-demand and terminate immediately by default?
19:44:30 <clarkb> ianw: no frickler pointed it out in irc then added it to the meeting agenda
19:44:43 <clarkb> ianw: the tldr is running emacs to open the gpg encrypted file starts a gpg agent that never goes away
19:44:55 <fungi> i want to say i've seen instructions for how to do that when building and signing debian packages in a chroot
19:45:15 <frickler> ianw: and it caches the passphrase for an hour or so
19:45:19 <clarkb> fungi: the gpg-agent --daemon emacs command is the command version of that
19:45:28 <clarkb> fungi: so it is possible
19:45:40 <ianw> oh right, yeah from time to time i've killed gpg-agents after emacs went bananas and totally messed up the terminal too
19:46:25 <mordred> fungi: we were unable to find any way to cause the correct behavior when we were looking at this in berlin
19:46:45 <clarkb> I think the editpw script solves this in a pretty straightforward way
19:46:49 <clarkb> so I'm good with that option
19:46:56 <mordred> fungi: other than the gpg-agent --daemon emacs ... but if there is a way to just configure the system to not be terrible, that would bae awesome
19:47:01 <mordred> clarkb: ++
19:47:41 <clarkb> frickler: is that something you would like to push up?
19:48:02 <fungi> yeah, i'm hunting for a config option there
19:48:19 <clarkb> ok we've only got ~12 minutes so lets get to the next item
19:48:26 <frickler> clarkb: I can give it a try, but if someone else wants to do it, I'm not too sad either
19:48:31 <clarkb> frickler: ok
19:48:37 <clarkb> Next up Trusty server upgrades
19:48:45 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:48:52 <clarkb> fungi and I have made recent progress on this
19:49:05 <clarkb> there are still a few outstanding and potentially getting more difficult servers in that list.
19:49:36 <fungi> yeah, frickler just now approved the inventory addition for the openstackid.org xenial replacement so we're probably close to swapping it out
19:49:40 <clarkb> If you can grab one or two it will go quickly. Another thought was maybe a sprint sometime after openstack release candidate processes quiet down? ( think that is mid march?
19:50:15 <fungi> we probably need to plan another in-place ubuntu release upgrade for lists.o.o
19:50:19 <clarkb> I can send out an email to try and organize a sprint if we think we'd sit down for a few days to focus on this. Maybe let me know in the infra channel today if you think you could help with a sprint
19:51:17 <clarkb> prometheanfire: ok image size questions. The easiest way to start sorting that out would likely be to grab the qcow2's from before and after the size job, and either nbd mount them or boot them and run du?
19:51:23 <clarkb> prometheanfire: I do not know why there is a size jump
19:51:39 <clarkb> most of that image size cost is git repos iirc. So it is possible we've got a git repo growing quickly
19:51:45 <ianw> i will go back and see why we turned of the automatic size reporting in the dib logs
19:51:56 <corvus> i'd like to let folks know that i'm continuing work on using the intermediate docker registry.  still hitting bugs, but hopefully soon we'll be able to have speculative docker images, which means we can test changes to images in our deployment tooling before deploying them.
19:52:05 <ianw> it was intended for exactly this :)
19:52:14 <clarkb> corvus: that is exciting
19:52:18 <clarkb> corvus: also yay for ansible bugs/behavior
19:52:31 <mordred> corvus: \o/
19:52:31 <clarkb> #topic Open Discussion
19:52:35 <prometheanfire> clarkb: ya, will probably do that
19:52:37 <clarkb> we've started discussing open things anyway :)
19:52:42 <prometheanfire> ptg schedule?
19:53:29 <clarkb> prometheanfire: I brought it up last week and have an etherpad to start putting ideas on
19:53:32 * clarkb finds link
19:53:48 <clarkb> #link https://etherpad.openstack.org/2019-denver-ptg-infra-planning
19:54:09 <clarkb> I think we're moving at a relatively quick pace on a lot of items so it is hard to know exactly what we'll want to work on at the PTG
19:54:17 <clarkb> but there should be plenty of topics when we get there
19:54:32 <prometheanfire> k
19:55:04 <corvus> i think we should do some containerization, but i don't think we'll know what until much closer.
19:57:08 <clarkb> One thing I've struggeld with (and maybe this means I need to try hard to use gertty again) is keeping track of outstanding reviews that need to be done for these various efforts. I think we have been reasonably good about posting stacks/topics that need eyeballs but maybe I should look at using storyboard worklists again
19:57:27 <ianw> this is where i got sidetracked with graphite; it's probably better to just do the easy upgrade to xenial, then tackle containerisation separately?
19:58:01 <clarkb> ianw: I think that is where we've ended up on a couple other upgrades.
19:58:03 <corvus> clarkb: i try to make sure the topic is set on opendev-gerrit changes
19:58:22 <corvus> it's not perfect
19:58:29 * corvus changes some topics
19:58:32 <clarkb> corvus: ya maybe what I really need is a gerrit dashboard to collect a number of those topics for me
19:58:41 <clarkb> (I'm mostly just brainstorming out loud about this)
19:58:56 <ianw> clarkb: gerrit-dash-creator might help too, certainly a shared one that highlights current important topics
19:59:11 <corvus> well, we're supposed to have that in the specs page
19:59:13 <corvus> http://specs.openstack.org/openstack-infra/infra-specs/
19:59:31 <clarkb> corvus: ya, you can fine tune it a bit more with a dashboard though
19:59:32 <corvus> that url seems broken
19:59:33 <corvus> https://review.openstack.org/#/q/status:open+AND+(topic:storyboard-migration+OR+topic:opendev-gerrit)
19:59:46 <clarkb> like show all the changes with one +2 and no -1's to flush them quickly etc
19:59:53 <fungi> sphinx's url highlighting may simply be broken
20:00:02 <ianw> oh right, yeah we could do a little fancier one that filters out the different votes
20:00:10 <fungi> looks like it doesn't include the trailing ) in the hyperlink and instead shows it as text
20:00:12 <corvus> sure, but we don't need that for priority efforts
20:00:19 <clarkb> corvus: that is true
20:00:20 <corvus> there are never more than a handful of those
20:00:25 <clarkb> and we are at time
20:00:30 <fungi> thanks clarkb!
20:00:36 <clarkb> Thank you everyone. I'll send you back to your regularly scheduled day
20:00:39 <corvus> so what i'd love is if we kept up making sure the topics are set right
20:00:40 <clarkb> #endmeeting