19:01:33 <clarkb> #startmeeting infra
19:01:34 <openstack> Meeting started Tue Feb  4 19:01:33 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:35 <zbr> o/
19:01:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:38 <openstack> The meeting name has been set to 'infra'
19:01:41 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2020-February/006595.html Our Agenda
19:01:48 <clarkb> We have an agenda
19:01:57 <ianw> o/
19:02:03 <clarkb> #topic Announcements
19:02:52 <clarkb> I was planning to be out tomorrow to go fishing but was told last night that weather is not good so that probably isn't happening. However, we just had to pick up a sick kid from school so I might be AFK anyway to take care of sick kid
19:03:02 <clarkb> TL;DR I may not be here tomorrow
19:03:47 <clarkb> #topic Actions from last meeting
19:04:03 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-01-28-19.01.txt minutes from last meeting
19:05:03 <clarkb> I had an action to file a story about the gitea OOM process. I did that. Let me find a link
19:05:39 <clarkb> #link https://storyboard.openstack.org/#!/story/2007237
19:06:20 <clarkb> With that I think we can dive right into our priority topics
19:06:23 <clarkb> #topic Priority Topics
19:06:32 <clarkb> #topic OpenDev
19:06:35 <mordred> o/
19:06:42 <clarkb> #link https://review.opendev.org/#/c/703134/ Split OpenDev out of OpenStack Governance
19:06:48 <clarkb> #link https://review.opendev.org/#/c/703488/ Update OpenDev docs with new Governance
19:07:14 <clarkb> Igot a bit sniped last week doing config management things, but these chanegs don't have comments that need to be addressed. Will try to push the TC to move things along
19:07:36 <clarkb> The gitea bug about poor performance on large repos has been closed
19:07:42 <clarkb> lunny added a commit cache to gitea
19:08:06 <clarkb> This is supposed to have a major impact on rendering performance based on the issue and PR conversations
19:08:12 <clarkb> I think this will end up in Gitea 1.12
19:08:45 <mordred> neat!
19:08:47 <clarkb> I'm not sure if we want to run an unreleased version to pick that up early. Or just wait. But thought I would call it out so that people are aware that in the (hopefully near) future gitea performance should improve a lot
19:08:53 <corvus> yowza!
19:09:21 <fungi> that's excellent news
19:09:21 <mordred> we've run unreleased before  - we could certainly git it a shot
19:09:32 <fungi> as long as we don't run into cache invalidation issues of course ;)
19:09:35 <mordred> maybe we shoudl at least go ahead and propose a change to try building images from current master so we can see what's broken
19:09:42 <clarkb> mordred: ++
19:09:48 * mordred will propose that change
19:09:52 <clarkb> I think we also have to configure using the cache as it isn't enabled by default
19:09:56 <clarkb> so we can figure that out in those changes too
19:10:04 <fungi> yeah, i'm not opposed to running unreleased gitea for this since it would allow us to give them feedback sooner on real-world performance gains
19:11:39 <mordred> https://review.opendev.org/705804 WIP Build gitea images from master
19:11:56 * diablo_rojo sneaks in late
19:12:02 <mordred> we'll see first if it even builds
19:13:01 <clarkb> thank you for that.
19:13:19 <clarkb> As far as operations go I think we'll want to monitor its memory consumption but I don't expect it to be too bad
19:13:27 <clarkb> its a small number of records
19:13:43 <clarkb> compared to git operations for large repos it should save us memory
19:14:26 <clarkb> Anything else opendev related or should we continue?
19:15:26 <mordred> clarkb: I just made that patch 3 patches
19:15:39 <mordred> one to upgrade us to 1.10.3 - one to upgrade to 1.11 and one to go to master
19:15:45 <mordred> since we should probably do all three of those things :)
19:15:47 <clarkb> makes sense
19:16:21 <clarkb> #topic Update Configuration Management
19:16:26 <fungi> i wonder if the commit cache will also absorb the oom issues we've been seeing
19:16:53 <clarkb> fungi: we'll likely just have to update and monitor
19:17:10 <clarkb> mordred: did you want to give us an update on review-dev?
19:19:06 <mordred> I do!
19:19:39 <mordred> so - I have a patch up to add apache to the ansible ... but that then shone a light on the use of old-style certs and whatnot
19:20:05 <mordred> SO - I've spun up a new review-dev.opendev.org and am working on getting it LE'd
19:20:22 <mordred> so that the role can just be written to work with LE and not need to handle getting cert data from heira
19:20:35 <mordred> this brings me to the question I have for folks ...
19:20:53 <mordred> how do we deal with functional testing of apaches that are configured to use LE certs
19:20:55 <mordred> ?
19:20:56 <corvus> i guess that's a smooth enough process that incorporating it shouldn't add too much to the whole effort
19:21:18 <mordred> yeah - it's actually super easy so far - less effort that handling the logical branches for the other thing
19:21:30 <mordred> except for the testing question
19:21:38 <clarkb> mordred: we do have functional testing of that using the LE roles, its mostly transparent to the testing
19:22:05 <corvus> do we have a functional test of an apache that uses le?
19:22:09 <clarkb> ianw can probably describe it better. But in your test jobs you add the LE roles as per usual but set the "I'm a test flag" then it sets up self signed certs as if they came from LE
19:22:26 <mordred> clarkb: oh - cool
19:22:28 <clarkb> corvus: I don't know that we have it for apache specifically but we have other things doing it for sure
19:22:31 <mordred> https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/static/files/50-security.openstack.org.conf#L20-L22
19:22:38 <mordred> there's an example apache we have with LE certs
19:22:56 <mordred> so there's a flag I can set in a test job and the LE roles will make me self-signed certs and put them in the right place?
19:22:58 <mordred> NEAT
19:23:01 * mordred hands ianw a giant pie
19:23:12 * mordred will figure that out - now that he knows it exists
19:23:28 <clarkb> mordred: https://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L1085-L1112 should cover that example
19:23:30 <corvus> meeting is a win already
19:23:59 <mordred> AH
19:24:07 <mordred> I just have to run the letsencrypt service playbook
19:24:16 <mordred> that's so cool
19:24:16 <ianw> (yeah, there's some examples in testinfra too of various flags to connect for testing)
19:24:48 <fungi> it is an amazing emergent behavior
19:25:12 <fungi> (granted one which took some initial effort to support i'm sure)
19:25:26 <corvus> yeah, looks like it's completely automatic because of playbooks/zuul/templates/group_vars/letsencrypt.yaml.j2
19:25:45 <corvus> (a group-var which applies to all letsencrypt hosts but is only included in ci)
19:25:59 <clarkb> Semi related to ^ I've been working with our docker + ansible + testing stuff recently in the refstack rebuild and its really neat how we've built somethign that is testable end to end.
19:26:18 <mordred> clarkb: ++
19:28:16 <corvus> i feel like it may be time for a new conference talk, because "here's how you can test your app end-to-end from source code change through gitops to containerized deployment (with certs!)" is like catnip for sysadmins.
19:28:43 <corvus> throw speculative dns in there and you'd really have something. ;)
19:29:20 <clarkb> the really great non obvious thing is it means people can push changes and have a fair degree of confidence that they will work with having root or setting up a complex local system
19:29:35 <corvus> [without] ^
19:30:38 <clarkb> right without
19:30:52 <clarkb> Anything else on the subject of updating our config management tooling?
19:32:06 <clarkb> #topic General Topics
19:32:17 <clarkb> We made great progress and further removing trusty last week
19:32:43 <clarkb> status.openstack.org has been upgraded. It does need a gerritlib release to get the gerrit stream events portion of elastic-recheck's bot working again
19:32:47 <clarkb> but otherwise it is happy
19:32:49 <mordred> cool
19:33:08 <clarkb> I was planning to do a gerritlib release on thurdsay (when I'll have time to keep an eye on it). If someone else wants to do it before then feel free
19:33:10 <mordred> fwiw - review-dev01.opendev.org is running Xenial, since that's what review01.openstack.org is running
19:33:16 <clarkb> mordred: ++
19:33:47 <clarkb> fungi reported success logging into a wiki-dev with some more recent updates
19:33:59 <clarkb> fungi: are we at the point where that is actually automatable to get a working wiki server?
19:34:45 <fungi> not yet, at least not any more "working" than the current state of wiki-dev.o.o
19:35:01 <fungi> some of the plugins aren't operable yet so they need individual troubleshooting
19:35:07 <clarkb> I see
19:35:14 <fungi> particularly the openid login isn't displaying
19:35:21 <fungi> login link
19:35:36 <clarkb> directly navigating to the url to login works now though?
19:35:47 <fungi> i haven't tested that yet
19:36:00 <fungi> also there are a few spam control related config options set on wiki.o.o which need to get copied to the template in the puppet module
19:36:22 <clarkb> are there any outstanding changes that need review?
19:36:43 <fungi> i think we got all the ones from before i left for fosdem merged. hopefully i'll have more up this week
19:36:57 <clarkb> sounds good, thanks
19:37:30 <clarkb> I started looking into a refstack.o.o upgrade last week as well. I quickly realized that a lot of the puppet just wouldn't work anymore beacuse of chagnes to nodejs and friends on ubuntu
19:38:10 <clarkb> rather than rewrite a bunch of puppet I started looking into deploying refstack with docker and ansible instead. refstack has a dockerfile, but I can't make it work ebcause it accesses files outside of its Dockerfile dir. Also it is a fat container with nginx, mysql, and refstack all running in one container.
19:38:31 <clarkb> What I'ev done instead is build it on top of our python-builder + python-base images. To get that to work I need changes in refstack to support python3.7
19:38:57 <mordred> ++
19:38:59 <clarkb> Those changes appear to be fairly minimal but we still need someone to maintain the software for this to work. Sorting out if those people exist is where I am at now
19:39:03 * mordred supports clarkb's effort
19:39:22 <clarkb> It sounds like there is probably still interest and I can probably get some reviews (maybe I have them already, haven't checked today yet)
19:39:58 <clarkb> If I can get those changes landed then my next step is adding apache proxy to my change with LE support baked in. Then we schedule a downtime and do a cutover and copy database stuff
19:40:19 <clarkb> #link https://review.opendev.org/#/c/705258/ if you are curiuous to see what that looks like in its current form
19:41:21 <clarkb> And that takes us to static.opendev.org progress
19:41:28 <clarkb> ianw: ^ I think you have all the updates on this?
19:41:49 <ianw> yeah i have a few things ready for review
19:42:01 <ianw> #link https://review.opendev.org/#/q/topic:static-services+(status:open+OR+status:merged)
19:42:23 <ianw> in short, this renames the upload-afs role to upload-afs-roots, and adds a new upload-afs-synchronize role
19:42:57 <ianw> upload-afs-roots is very helpful for doc updates where you want to keep various directories but not others, upload-afs-synchronize is more a straight copy, better for things like tarballs
19:43:23 <ianw> also, thanks to corvus and clarkb for fixing my oops destroying the zuul kerberos keytab by issuing a new one
19:43:44 <ianw> i have a plan in
19:43:47 <ianw> #link https://storyboard.openstack.org/#!/story/2006598
19:43:51 <corvus> "improved security posture due to unscheduled key rotation" :)
19:44:16 <mordred> \o/
19:44:17 <ianw> task  38607 to add a new keytab for project.tarballs, if anyone would like to comment
19:44:45 <ianw> #link https://review.opendev.org/704913
19:44:50 <clarkb> ianw: do we need to do a careful rollout of those role changes to avoid breaking jobs?
19:44:54 <ianw> is also part but can go in, to setup the servers
19:45:00 <clarkb> (that might already be encoded in the changes?)
19:45:16 <ianw> clarkb: at this point, no, it's not a big switch
19:45:28 <corvus> it's worth noting that in kerberos, whenever you issue a keytabe, you invalidate old keytab versions of the same principal.  that's a non-obvious side effect.  we've added warning boxes to the docs to remind folks.
19:45:29 <ianw> we have just one testing job on project-config that actually uses the secret and uploads
19:45:41 <clarkb> got it
19:46:24 <ianw> after this is all in though, yes changes to the base jobs to publish will want careful attention, and synchronizing switching dns for tarballs.<opendev|openstack>.org
19:47:00 <ianw> i think that's it for that, really just reviews right now
19:47:33 <clarkb> thank you for the update. /me makes a note to try and review those
19:47:53 <clarkb> Next and update on the new arm64 ci cloud
19:48:15 <clarkb> unfortunately it looks like nb03.o.o is still unable to talk to us.linaro.cloud. But kevinz is back from holidays and aware of the problem
19:48:28 <clarkb> ianw: ^ have we heard anything new on that?
19:48:56 <ianw> yes we've corresponded some, he has confirmed the issue but has not root cause nor eta for fixing it
19:49:18 <ianw> at this point, we may have to consider bringing up a nodepool builder in the us cloud itself that just uploads there
19:49:29 <clarkb> that seems like a reasonable workaround
19:49:34 <ianw> (although, now i think about it, i haven't tested if *it* can talk to london ...)
19:49:56 <clarkb> ya we might need separate builders :/
19:50:13 <clarkb> not great since we want synchronized images, but workable
19:50:52 <ianw> wget https://uk.linaro.cloud:5000 works on the us mirror ... how odd
19:51:08 <clarkb> probably a firewall somewhere
19:51:09 <ianw> so i guess we could move the builder to the us cloud, and it may work everywhere?
19:51:19 <clarkb> ianw: ya if you can hit uk from us then probably that would work
19:52:08 <ianw> ok, i can look into that path to get it going
19:52:30 <clarkb> Last item on the agenda is an update on the airshipci cloud. The chagne to start uploading images there just merged. I've also laid out a rough nl02 config for what I think this looks like. Basically two pools. One with generic resources and the other with airship's more special resources. Currently the more special resources are modelled after fortnebula's extra memory labels
19:52:39 <ianw> #link https://storyboard.openstack.org/#!/story/2007195
19:52:46 <ianw> ^ story on arm64 cloud, will add things there as we go
19:53:02 <clarkb> I've got an email out to roman to provide a bit more input on what we need in that second pool and from there I'll work with the cloud to get quotas updated
19:53:19 <clarkb> I do still need to build a mirror but since they are all in europe I figured getting a head start on the quota related stuff would be a good idea
19:54:40 <clarkb> And that concludes the planned agenda
19:54:43 <clarkb> #topic Open Discussion
19:54:48 <clarkb> Anything else to bring up?
19:56:22 * fungi has nothing, just excited to catch up on everything that's happened over/around the weekend
19:56:46 <corvus> fungi: any chance you can send out a fosdem report covering zuul topics?
19:58:42 <fungi> i have it on my to do list, yes
19:58:57 <fungi> though the short version is: people like our stickers
19:59:10 <fungi> (and that causes them to pick up literature and learn more)
19:59:17 <fungi> (and to ask about zuul)
19:59:31 <corvus> there were talks too, right?
19:59:36 <fungi> also our table was placed adjacent to the jenkins table both days
20:00:10 <clarkb> and we are at time now
20:00:11 <fungi> there were no zuul-specific talks, there was going to be one but mnaser fell ill and the rest of us didn't find until his talk was cancelled so couldn't fill in for him
20:00:12 <clarkb> thank you everyone
20:00:28 <clarkb> We'll see you next week
20:00:33 <fungi> thanks clarkb!
20:00:45 <clarkb> and feel free to continue discussion in our nromal conversation channels (irc and mailing lists)
20:00:48 <clarkb> #endmeeting