19:01:33 #startmeeting infra 19:01:34 Meeting started Tue Feb 4 19:01:33 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:35 o/ 19:01:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:38 The meeting name has been set to 'infra' 19:01:41 #link http://lists.openstack.org/pipermail/openstack-infra/2020-February/006595.html Our Agenda 19:01:48 We have an agenda 19:01:57 o/ 19:02:03 #topic Announcements 19:02:52 I was planning to be out tomorrow to go fishing but was told last night that weather is not good so that probably isn't happening. However, we just had to pick up a sick kid from school so I might be AFK anyway to take care of sick kid 19:03:02 TL;DR I may not be here tomorrow 19:03:47 #topic Actions from last meeting 19:04:03 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-01-28-19.01.txt minutes from last meeting 19:05:03 I had an action to file a story about the gitea OOM process. I did that. Let me find a link 19:05:39 #link https://storyboard.openstack.org/#!/story/2007237 19:06:20 With that I think we can dive right into our priority topics 19:06:23 #topic Priority Topics 19:06:32 #topic OpenDev 19:06:35 o/ 19:06:42 #link https://review.opendev.org/#/c/703134/ Split OpenDev out of OpenStack Governance 19:06:48 #link https://review.opendev.org/#/c/703488/ Update OpenDev docs with new Governance 19:07:14 Igot a bit sniped last week doing config management things, but these chanegs don't have comments that need to be addressed. Will try to push the TC to move things along 19:07:36 The gitea bug about poor performance on large repos has been closed 19:07:42 lunny added a commit cache to gitea 19:08:06 This is supposed to have a major impact on rendering performance based on the issue and PR conversations 19:08:12 I think this will end up in Gitea 1.12 19:08:45 neat! 19:08:47 I'm not sure if we want to run an unreleased version to pick that up early. Or just wait. But thought I would call it out so that people are aware that in the (hopefully near) future gitea performance should improve a lot 19:08:53 yowza! 19:09:21 that's excellent news 19:09:21 we've run unreleased before - we could certainly git it a shot 19:09:32 as long as we don't run into cache invalidation issues of course ;) 19:09:35 maybe we shoudl at least go ahead and propose a change to try building images from current master so we can see what's broken 19:09:42 mordred: ++ 19:09:48 * mordred will propose that change 19:09:52 I think we also have to configure using the cache as it isn't enabled by default 19:09:56 so we can figure that out in those changes too 19:10:04 yeah, i'm not opposed to running unreleased gitea for this since it would allow us to give them feedback sooner on real-world performance gains 19:11:39 https://review.opendev.org/705804 WIP Build gitea images from master 19:11:56 * diablo_rojo sneaks in late 19:12:02 we'll see first if it even builds 19:13:01 thank you for that. 19:13:19 As far as operations go I think we'll want to monitor its memory consumption but I don't expect it to be too bad 19:13:27 its a small number of records 19:13:43 compared to git operations for large repos it should save us memory 19:14:26 Anything else opendev related or should we continue? 19:15:26 clarkb: I just made that patch 3 patches 19:15:39 one to upgrade us to 1.10.3 - one to upgrade to 1.11 and one to go to master 19:15:45 since we should probably do all three of those things :) 19:15:47 makes sense 19:16:21 #topic Update Configuration Management 19:16:26 i wonder if the commit cache will also absorb the oom issues we've been seeing 19:16:53 fungi: we'll likely just have to update and monitor 19:17:10 mordred: did you want to give us an update on review-dev? 19:19:06 I do! 19:19:39 so - I have a patch up to add apache to the ansible ... but that then shone a light on the use of old-style certs and whatnot 19:20:05 SO - I've spun up a new review-dev.opendev.org and am working on getting it LE'd 19:20:22 so that the role can just be written to work with LE and not need to handle getting cert data from heira 19:20:35 this brings me to the question I have for folks ... 19:20:53 how do we deal with functional testing of apaches that are configured to use LE certs 19:20:55 ? 19:20:56 i guess that's a smooth enough process that incorporating it shouldn't add too much to the whole effort 19:21:18 yeah - it's actually super easy so far - less effort that handling the logical branches for the other thing 19:21:30 except for the testing question 19:21:38 mordred: we do have functional testing of that using the LE roles, its mostly transparent to the testing 19:22:05 do we have a functional test of an apache that uses le? 19:22:09 ianw can probably describe it better. But in your test jobs you add the LE roles as per usual but set the "I'm a test flag" then it sets up self signed certs as if they came from LE 19:22:26 clarkb: oh - cool 19:22:28 corvus: I don't know that we have it for apache specifically but we have other things doing it for sure 19:22:31 https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/static/files/50-security.openstack.org.conf#L20-L22 19:22:38 there's an example apache we have with LE certs 19:22:56 so there's a flag I can set in a test job and the LE roles will make me self-signed certs and put them in the right place? 19:22:58 NEAT 19:23:01 * mordred hands ianw a giant pie 19:23:12 * mordred will figure that out - now that he knows it exists 19:23:28 mordred: https://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L1085-L1112 should cover that example 19:23:30 meeting is a win already 19:23:59 AH 19:24:07 I just have to run the letsencrypt service playbook 19:24:16 that's so cool 19:24:16 (yeah, there's some examples in testinfra too of various flags to connect for testing) 19:24:48 it is an amazing emergent behavior 19:25:12 (granted one which took some initial effort to support i'm sure) 19:25:26 yeah, looks like it's completely automatic because of playbooks/zuul/templates/group_vars/letsencrypt.yaml.j2 19:25:45 (a group-var which applies to all letsencrypt hosts but is only included in ci) 19:25:59 Semi related to ^ I've been working with our docker + ansible + testing stuff recently in the refstack rebuild and its really neat how we've built somethign that is testable end to end. 19:26:18 clarkb: ++ 19:28:16 i feel like it may be time for a new conference talk, because "here's how you can test your app end-to-end from source code change through gitops to containerized deployment (with certs!)" is like catnip for sysadmins. 19:28:43 throw speculative dns in there and you'd really have something. ;) 19:29:20 the really great non obvious thing is it means people can push changes and have a fair degree of confidence that they will work with having root or setting up a complex local system 19:29:35 [without] ^ 19:30:38 right without 19:30:52 Anything else on the subject of updating our config management tooling? 19:32:06 #topic General Topics 19:32:17 We made great progress and further removing trusty last week 19:32:43 status.openstack.org has been upgraded. It does need a gerritlib release to get the gerrit stream events portion of elastic-recheck's bot working again 19:32:47 but otherwise it is happy 19:32:49 cool 19:33:08 I was planning to do a gerritlib release on thurdsay (when I'll have time to keep an eye on it). If someone else wants to do it before then feel free 19:33:10 fwiw - review-dev01.opendev.org is running Xenial, since that's what review01.openstack.org is running 19:33:16 mordred: ++ 19:33:47 fungi reported success logging into a wiki-dev with some more recent updates 19:33:59 fungi: are we at the point where that is actually automatable to get a working wiki server? 19:34:45 not yet, at least not any more "working" than the current state of wiki-dev.o.o 19:35:01 some of the plugins aren't operable yet so they need individual troubleshooting 19:35:07 I see 19:35:14 particularly the openid login isn't displaying 19:35:21 login link 19:35:36 directly navigating to the url to login works now though? 19:35:47 i haven't tested that yet 19:36:00 also there are a few spam control related config options set on wiki.o.o which need to get copied to the template in the puppet module 19:36:22 are there any outstanding changes that need review? 19:36:43 i think we got all the ones from before i left for fosdem merged. hopefully i'll have more up this week 19:36:57 sounds good, thanks 19:37:30 I started looking into a refstack.o.o upgrade last week as well. I quickly realized that a lot of the puppet just wouldn't work anymore beacuse of chagnes to nodejs and friends on ubuntu 19:38:10 rather than rewrite a bunch of puppet I started looking into deploying refstack with docker and ansible instead. refstack has a dockerfile, but I can't make it work ebcause it accesses files outside of its Dockerfile dir. Also it is a fat container with nginx, mysql, and refstack all running in one container. 19:38:31 What I'ev done instead is build it on top of our python-builder + python-base images. To get that to work I need changes in refstack to support python3.7 19:38:57 ++ 19:38:59 Those changes appear to be fairly minimal but we still need someone to maintain the software for this to work. Sorting out if those people exist is where I am at now 19:39:03 * mordred supports clarkb's effort 19:39:22 It sounds like there is probably still interest and I can probably get some reviews (maybe I have them already, haven't checked today yet) 19:39:58 If I can get those changes landed then my next step is adding apache proxy to my change with LE support baked in. Then we schedule a downtime and do a cutover and copy database stuff 19:40:19 #link https://review.opendev.org/#/c/705258/ if you are curiuous to see what that looks like in its current form 19:41:21 And that takes us to static.opendev.org progress 19:41:28 ianw: ^ I think you have all the updates on this? 19:41:49 yeah i have a few things ready for review 19:42:01 #link https://review.opendev.org/#/q/topic:static-services+(status:open+OR+status:merged) 19:42:23 in short, this renames the upload-afs role to upload-afs-roots, and adds a new upload-afs-synchronize role 19:42:57 upload-afs-roots is very helpful for doc updates where you want to keep various directories but not others, upload-afs-synchronize is more a straight copy, better for things like tarballs 19:43:23 also, thanks to corvus and clarkb for fixing my oops destroying the zuul kerberos keytab by issuing a new one 19:43:44 i have a plan in 19:43:47 #link https://storyboard.openstack.org/#!/story/2006598 19:43:51 "improved security posture due to unscheduled key rotation" :) 19:44:16 \o/ 19:44:17 task 38607 to add a new keytab for project.tarballs, if anyone would like to comment 19:44:45 #link https://review.opendev.org/704913 19:44:50 ianw: do we need to do a careful rollout of those role changes to avoid breaking jobs? 19:44:54 is also part but can go in, to setup the servers 19:45:00 (that might already be encoded in the changes?) 19:45:16 clarkb: at this point, no, it's not a big switch 19:45:28 it's worth noting that in kerberos, whenever you issue a keytabe, you invalidate old keytab versions of the same principal. that's a non-obvious side effect. we've added warning boxes to the docs to remind folks. 19:45:29 we have just one testing job on project-config that actually uses the secret and uploads 19:45:41 got it 19:46:24 after this is all in though, yes changes to the base jobs to publish will want careful attention, and synchronizing switching dns for tarballs..org 19:47:00 i think that's it for that, really just reviews right now 19:47:33 thank you for the update. /me makes a note to try and review those 19:47:53 Next and update on the new arm64 ci cloud 19:48:15 unfortunately it looks like nb03.o.o is still unable to talk to us.linaro.cloud. But kevinz is back from holidays and aware of the problem 19:48:28 ianw: ^ have we heard anything new on that? 19:48:56 yes we've corresponded some, he has confirmed the issue but has not root cause nor eta for fixing it 19:49:18 at this point, we may have to consider bringing up a nodepool builder in the us cloud itself that just uploads there 19:49:29 that seems like a reasonable workaround 19:49:34 (although, now i think about it, i haven't tested if *it* can talk to london ...) 19:49:56 ya we might need separate builders :/ 19:50:13 not great since we want synchronized images, but workable 19:50:52 wget https://uk.linaro.cloud:5000 works on the us mirror ... how odd 19:51:08 probably a firewall somewhere 19:51:09 so i guess we could move the builder to the us cloud, and it may work everywhere? 19:51:19 ianw: ya if you can hit uk from us then probably that would work 19:52:08 ok, i can look into that path to get it going 19:52:30 Last item on the agenda is an update on the airshipci cloud. The chagne to start uploading images there just merged. I've also laid out a rough nl02 config for what I think this looks like. Basically two pools. One with generic resources and the other with airship's more special resources. Currently the more special resources are modelled after fortnebula's extra memory labels 19:52:39 #link https://storyboard.openstack.org/#!/story/2007195 19:52:46 ^ story on arm64 cloud, will add things there as we go 19:53:02 I've got an email out to roman to provide a bit more input on what we need in that second pool and from there I'll work with the cloud to get quotas updated 19:53:19 I do still need to build a mirror but since they are all in europe I figured getting a head start on the quota related stuff would be a good idea 19:54:40 And that concludes the planned agenda 19:54:43 #topic Open Discussion 19:54:48 Anything else to bring up? 19:56:22 * fungi has nothing, just excited to catch up on everything that's happened over/around the weekend 19:56:46 fungi: any chance you can send out a fosdem report covering zuul topics? 19:58:42 i have it on my to do list, yes 19:58:57 though the short version is: people like our stickers 19:59:10 (and that causes them to pick up literature and learn more) 19:59:17 (and to ask about zuul) 19:59:31 there were talks too, right? 19:59:36 also our table was placed adjacent to the jenkins table both days 20:00:10 and we are at time now 20:00:11 there were no zuul-specific talks, there was going to be one but mnaser fell ill and the rest of us didn't find until his talk was cancelled so couldn't fill in for him 20:00:12 thank you everyone 20:00:28 We'll see you next week 20:00:33 thanks clarkb! 20:00:45 and feel free to continue discussion in our nromal conversation channels (irc and mailing lists) 20:00:48 #endmeeting