19:01:16 #startmeeting infra 19:01:16 Meeting started Tue Aug 24 19:01:16 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:16 The meeting name has been set to 'infra' 19:01:22 #link http://lists.opendev.org/pipermail/service-discuss/2021-August/000278.html Our Agenda 19:01:28 #topic Announcements 19:01:33 o/ 19:01:37 Zuul is now on our matrix homeserver at #zuul:opendev.org 19:01:55 this means you should wander over there if you have zuul issues/questions/concerns or just want to say hi :) 19:02:11 also the matrix hosting info is in the normal location for that stuff if hosting things come up 19:02:48 OpenStack's feature freeze begins next week 19:02:59 o/ 19:03:06 I expect there will be a big rush of changes as a result next week which may make zuul restarts annoying. 19:03:17 But otherwise try to be aware of that as we make changes so we don't impact that too much 19:04:15 Also I'll be taking monday off to see if I can find any salmon 19:04:22 also we have a new (minor) gerrit version since last meeting, yeah? 19:04:48 oh yup. Upgraded as gerrit made some minor updates for security concerns (neither seemed to directly affect us) and wanted to keep up to speed with that 19:05:18 g'luck with the salmon hunt 19:06:01 #topic Actions from last meeting 19:06:06 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-08-17-19.01.txt minutes from last meeting 19:06:14 There weren't any actions recorded 19:06:17 #topic Specs 19:06:22 #link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement 19:06:44 I'm still looking for feedback on this change if you have some time over tea :) 19:06:54 and we can close out the matrix spec now, or do we still have remaining tasks there? 19:07:19 Did the spec say we should implement a meetbot or not? if not I think we can close it 19:07:32 we have users relying on it at this point anyway, so it's in production (but not with complete bot feature parity) 19:07:55 yup, I think as long as we didn't promise that extra bot functioanlity as part of the initial spec then we are good to mark it completed 19:08:01 was anyone able to confirm if my status notice went through to the zuul channel? i wasn't watching at the time 19:08:12 fungi: I don't think it did. I think there isn't a status bot either 19:08:40 so anyway, yeah, i guess if those bits are called out in the spec then we need to keep it open for now 19:09:10 I'll check the spec after the meeting and lunch and can push a completion change if we didn't otherwise we can talk about that next week in our meeting I guess 19:09:23 in order to plan finishing that effort up 19:09:54 Status bot is in scope 19:10:08 Meetbot not in scope 19:10:15 thanks for confirming 19:10:50 ok, lets move on and we can talk about statusbot once we get through everything else 19:11:01 #topic Mailman server upgrades 19:11:20 So we're a little off script there but not too important. 19:11:29 fungi and I have scheduled a time for the lists.kc.io upgrade with the kata folks. This will be happening Thursday at 16:00UTC 19:11:41 based on testing we expect things to function post upgrade and the upgrade itself to take about 2 hours 19:11:45 kudos to debian/ubuntu packaging here, the in-place upgrade from xenial to focal was quite smooth 19:12:10 (with a stop off in bionic) 19:12:31 we'll turn off services on the instance, snapshot it, make sure it is up to date and rebooted. Then run through the in place upgrades to bionic and then to focal 19:13:19 worth noting, exim service disablement gets undone when the package switches from sysv-compat to systemd service unit 19:13:20 I've also checked that our zuul jobs are happy running our ansible against focal and they are so once we've finished the upgrade the config management should continue happily 19:13:57 we could probably preemptively add a disable symlink for the service unit now that we know that? 19:13:58 #topic Improving OpenDev's CD throughput 19:14:07 #undo 19:14:07 Removing item from minutes: #topic Improving OpenDev's CD throughput 19:14:18 oh, sorry, didn't mean to derail the agenda 19:14:36 fungi: that is a neat idea, I'm happy not doing that given it worked out ok on the test system but if you want to try that I'm not opposed 19:14:57 i'm up for trying it on lists.k.i since it's sort of a trial run for lists.o.o anyway 19:15:13 the risk of it breaking anything seems negligible 19:15:15 yup 19:15:47 I'll try to remember to put that in my planning doc when I write that up tomorrow 19:15:55 #topic Improving OpenDev's CD throughput 19:16:16 Not much new to say here as I don't think anyone has started doing the mapping (currently I have that scribbled on my todo list for friday) 19:16:31 But there is one change on the simple things we might be able to do upfront list: 19:16:32 #link https://review.opendev.org/c/opendev/system-config/+/805487 Be more specific about when inventory changes force jobs to run. 19:17:54 I suspect that this is safe, but I'm worried we might have situations where we do want to run jobs if the inventory/service/ paths update. But so far the only examples of that I have been able to come up with are letsencrypt and I think that is unchanged by mordred's change 19:18:19 whcih means as long as we match on inventory/service/thisservice and inventory/base that change should be ok 19:18:43 i've approved it. we can always add specific files to those 19:18:57 as we spot holes 19:18:58 ya we can do that was well 19:19:03 *as well 19:19:37 And like I said I'm hoping to start mapping out the jobs on Friday based on my current todo list 19:19:38 yeah that feels safe 19:20:01 Anything else to add on this subject? 19:20:02 i intend to help with the mapping exercise 19:20:09 remind me when you get going on it 19:20:22 will do and tahnks 19:21:11 #topic Gerrit Account Cleanups 19:21:32 It has been 3 weeks since I disabled the last batch of users. I intend on deleting the conflicting external ids from those retired user accounts tomorrow 19:22:16 That will leave us with ~30 accounts where reaching out to the users is a good idea. Hopefully that will make for a good activity during openstack feature freeze slush. My goal there is to push a single change to the external ids ref that addresses all of those and gets verified by our running gerrit server too 19:23:00 I'll probably create a working checkout of the ref on review02 and start committing to it. It isn't clear to me if I can make 30 separate commits or if I need to have one. I will probably try doing 30 separate ones first and then squash them all if pushing the stack is rejected by gerrit 19:23:35 they'll probably need to be squashed, i expect gerrit to validate each change individually 19:23:52 er, i guess they're just commits not changes, so maybe not 19:24:07 yup I'm not sure how it will handle that 19:24:08 either way, i agree with the plan 19:24:52 #topic Gitea 1.15.0 upgrade 19:25:09 Gitea 1.15.0 exists now. 19:25:21 there are a number of bugs they are looking at addressing that people have reported against 1.15.0 19:25:25 #link https://github.com/go-gitea/gitea/milestone/97 19:25:44 #link https://github.com/go-gitea/gitea/issues/16802 Primary Keys for mysql tables 19:26:00 issue 16802 in particular seems important for us since we run with mariadb 19:26:25 Since we're entering openstack feature freeze anyway and don't have a current pressing need to upgrade I think it would be best to see if we can wait for a 1.15.1 19:26:35 #link https://review.opendev.org/c/opendev/system-config/+/803231 WIP'd until we are happy with the gitea situation. 19:27:16 That all said one of the changes we will have to deal with is the hosting path for the logo files changes. Our Gerrit theme consumes the opendev logo from gitea and that path needs to update. The change above does update that path, but we may need to restart gerrit after the gitea upgrade too 19:27:37 ianw: you had talked about looking into hosting those files in a better way. Have you had a chance to look at that yet? 19:27:44 oh yeah, i was thinking about that 19:27:56 the expedient approach seems to be to dump the files in an afs directory 19:28:00 If we don't address that before the gitea upgrade I think that is fine as a gerritrestart is quick, but do want to bring up the relationship between the tasks here 19:28:08 ianw: oh interseting and then serve them off of static? 19:28:15 the "correct" approach seems to be to create a repo to hold logos/favicons/images and get that to publish to afs 19:28:43 i think it's probably worth doing the latter, as logos tend to morph and it's probably worth version tracking them 19:28:52 Most of them are svgs anyway so sticking them in git shouldn't be too bad 19:29:05 then we can create a static stie with a long cache, or just via static.opendev.org 19:29:36 if we think the repo and publishing is the way to go, i can do that 19:30:00 we could put those files in system-config though and just include them from there in the images, right? don't need a separate repo? 19:30:05 That plan seems reasonable to me. We'd still be hosting the files away from the services but via a webserver we have far more control over. 19:30:49 i guess i'm lost on why it needs its own repo 19:30:56 fungi: we could ... as in via https://opendev.org/opendev/system-config/... ? 19:31:19 ya the files are already largely in system-config 19:31:20 just to keep it separate really, and to make it simpler to reuse the existing afs publish jobs 19:31:26 yeah, make a new top-level directory... "assets" or whatever 19:32:27 i guess if the idea is to use it in publication jobs rather than putting it in the service images (so the files are hosted from the servers embedding them in their services) then having a repo using standard docs publishing jobs makes some sense 19:33:03 my main concern with putting them on static.o.o is that if that site is down it could cause problems for the services pointing at it for inlined files 19:33:24 (or if afs is offline, suddenly that's an impact to all our services) 19:33:40 if the images get written into the images we build, the services are more self-contained 19:34:02 yup and for some services like gitea I think we'd continue to embed because they expect to be themed that way and not with external resources 19:34:19 er, graphical images get written into the container images we build (too many meanings for image) 19:34:27 for that reason having the assets in system-config probably makes the most sense so we don't have multiple copies of files? though we could clone and copy in the image builds from the new repo 19:34:49 i don't think that failing tags will bring down services, so i don't think that's a blocker as such 19:35:05 ianw: probably not. might make things render funny for a bit but not fatal 19:35:23 right, i don't mind having multiple places the same image is served, the desire was to have only one version-controlled source for the images themselves 19:35:57 so we don't have to manually copy the same favicon to multiple directories/repositories or whatever 19:36:28 and worry about those different copies getting out of sync when we update one repo but not another 19:36:44 ianw: do you think it is possible to keep them in system-config under an assets/ dir? 19:36:53 right, so the canonical favicon could be logos.opendev.org/favicons.ico or https://opendev.org/opendev/system-config/master/assets/favicon.ico you mean? 19:37:10 yeah 19:38:10 note that the fallback location for favicon.ico is implicitly on the same server. only affects older browsers. 19:38:24 the other reason i'm waffling on the serving one copy and having services link browsers out to it is that it looks a little more like webbug/tracker behavior, may set off some privacy extensions, may need cors fiddling... 19:38:26 do we want logos.opendev.org? what needs it? 19:39:29 https://opendev.org/opendev/system-config/raw/branch/master/docker/gitea/custom/public/img/opendev-sm.png that actually does seem to work 19:39:46 in that case I think we don't need logos.opendev.org just a bit better organization? THis may be much simpler than I had anticipated 19:40:05 the counterargument i hear from webdev people is that having one location a particular file is served is that it speeds up user interaction due to browser-side caching (this is the argument pushed for using google's font service, for example). i'm not really sure i buy it for this case though 19:40:16 that may load a bit more slowly than hitting https://opendev.org/img/opendev-sm.png when caches are cold though 19:40:48 when we address things in the repo gitea does repo lookups and that will be slower on a cold cache, but once gitea's cache warms it should be about equivalent 19:41:31 maybe start with that since it doesn't require as many changes? 19:41:45 and if we find things aren't served quickly enough we can make a logos.opendev.org with better lookup times? 19:42:05 as for logos.opendev.org, maintaining yet another site does seem like unnecessary administrative overhead to me, if the real goal is just to deduplicate the graphics in our git repo(s) 19:42:21 lodgeit is the only one i can think of https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/lodgeit/templates/docker-compose.yaml.j2#L36 19:42:51 gerrit does similar as well 19:43:00 but they could both use https://opendev.org/opendev/system-config/raw/branch/master/docker/gitea/custom/public/img/opendev-sm.png type urls? 19:44:01 i guess one goes via gitea's static rendering (/img) and the other via actually hitting the git tree 19:44:44 if the plan is to serve them out of the repo, it would be prudent to put them in a dedicated directory with a README telling folks of the dangers of moving any files 19:45:08 corvus: yes I don't think we should use the actual url above if we use that method. We should put them in an assets/ type dir 19:45:14 but, if we simply ln -s ./assets/opendev-sm.png docker/gitea/custom/public/img/opendev-sm.png then the image builds and the png is statically served? 19:45:34 oh ... that will not be in the build context and probably won't work 19:46:17 ya I think that might be the difficult thing to sort out is how to keep copying the files into the docker images for services that expect that method of theming 19:46:29 I'm sure there is some way to do it that makes docker happy but I don't know what it is 19:47:27 i feel like bazel had this issue too with symlinks to ~/.cache or something 19:48:20 oh, so the problem is that we don't have any pre-build step to copy other files into the tree 19:48:58 ya docker doesn't like thngs that escape the path level it is started at 19:48:59 and for people trying to build those locally they'd need to perform an extra step to get the files which aren't present in that subdirectory where the rest of the files for the image reside? 19:49:15 and docker builds can't fetch files from some network location 19:49:30 docker builds can fetch the files from a network location. I think that is the common workaround here 19:49:35 or otherwise install things which involve getting them over a network? 19:49:42 but our speculative builds make that more complicated 19:49:48 yeah, agreed 19:50:20 the network location would be a file:/// path within the same repo or another repo checked out in the workspace as pushed by zuul, i guess 19:51:14 We definitely don't need to solve that right here in the meeting (which only has a few more minutes allocated) but I suspect solving the "how do the graphical images get into the docker image builds" question is the main one to solve for now? 19:51:28 there might be an escape hatch override in docker we can use too. I'm not familiar enough to know for sure 19:51:59 maybe give me an action to come up with something 19:52:08 a separate repo could solve that by us basically punting on the idea that we care about speculative exuections of those 19:52:16 then image builds just clone the repo and use the current state 19:52:50 #action ianw Figure out how to centrally store graphical images for logos while still making them consumable by docker image builds for services that want a copy of the data at build time. 19:52:54 something like that? 19:53:07 but if all our builds are driven from system-config then there's no need to put the files in another repo than system-config to get speculative support 19:53:19 fungi: except you have to get those files into the docker image 19:53:35 and having them in another repo makes that easier than having them in the same repo? 19:53:38 file:/// doesn't work and neither does cp ../../assets 19:53:42 kinda bizarre if so 19:53:50 fungi: ya because you can git clone that and the repo is as small as necessary to do that 19:54:03 fungi: rather than git cloning yourself fomr somewhere that isn't the actual state you expect 19:54:15 or not using git at all, just a relative filesystem path 19:54:30 but you can't do that within an image build as it is a security concern (if my undersatnding is correct) 19:54:45 docker limits you to things at and below the Dockerfile in the fs tree 19:54:51 oh, i see we could clone system-config and use files from it 19:54:55 yes 19:55:07 so still don't need a separate repo 19:55:21 right, but you might end up confused in a speculative state situation 19:55:26 just a second system-config clone from local path during the build 19:55:42 cloning from the same repo we're already using 19:55:48 fungi: can you do that within a git repo? 19:56:02 technically it's the build context, not the dockerfile, but yeah 19:56:04 (I don't actually know I've never tried cloning a git repo to a subtree of the git repo) 19:56:42 corvus: ah ok maybe we shift the context to the root of the repo then rather than the dockerfile location 19:56:56 you can clone your current repo within a subdirectory of your repo, yep, just tested it 19:56:58 then we would have access to everything in the git repo and not need to sort out cloning a repo into itself 19:57:06 fungi: neat 19:57:10 git clone ./ foo 19:57:41 Alright we are just about time. Lets open it up to anything else really quickly 19:57:45 #topic Open Discussion 19:57:56 Any last minute items to call out before we go find lunch/dinner/breakfast? 19:58:12 #link https://review.opendev.org/c/opendev/system-config/+/804916 19:58:16 was from last week 19:58:26 to fix the time randomisation on backups 19:58:34 oh thank you for that. I missed the change 19:58:43 I guess it went up during the meeting last week :) 19:58:51 i'll get back to debian-stable removal 19:59:07 in trying to figure out nodes/jobs etc i got quite side-tracked into updating how zuul shows that info :) 19:59:42 those improvements do make the info zuul presents more readable. Thank you for that 19:59:58 thanks clarkb! 20:00:04 oh and i don't think we got a response on the openeuler mirror? 20:00:16 ianw: I haven't seen one. But not sure I was cc'd on the original send? 20:00:17 * fungi has to skeedaddle 20:00:26 aha infra-root was cc'd I see that now and no response 20:00:40 thanks everyone we are at time 20:00:43 #endmeeting