19:01:16 <clarkb> #startmeeting infra
19:01:16 <openstack> Meeting started Tue May 26 19:01:16 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:19 <openstack> The meeting name has been set to 'infra'
19:01:44 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000029.html Our Agenda
19:01:46 <ianw> o/
19:01:51 <clarkb> #topic Announcements
19:02:01 <clarkb> The PTG is running next week
19:02:08 <zbr> o/
19:02:17 <AJaeger> o/
19:02:19 <clarkb> we'll talk more about that later in the meeting, but wanted to make sure people were aware that was going to happen next week
19:02:46 <clarkb> #topic Actions from last meeting
19:02:52 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-05-19-19.01.txt minutes from last meeting
19:03:05 <clarkb> There were no recorded actions, though I think a fair bit has happened so lets continue and discuss
19:03:10 <clarkb> #topic Priority Efforts
19:03:15 <clarkb> #topic Update Config Management
19:03:40 <clarkb> mordred: corvus: fungi: I think we have learned new things about using Zuul for CD thursday-friday ish?
19:04:13 <clarkb> specifically how requiring base and things like LE before running other services can have cascading effects. This is something ianw has pointed out before and has proposed a potential workaround
19:05:11 <corvus> a new insight is that since sometimes we have the run-zuul job run without the other base jobs, then we can short-circuit the protections afforded by running it in sequence after the base jobs
19:05:24 <mordred> yeah. that was less awesome
19:05:48 <corvus> clarkb: what's the ianw fix?
19:05:52 <mordred> clarkb: I've also got an idea about splitting base into per-service-base
19:05:54 <corvus> also, mordred had a proposal he was noodling on
19:06:06 <mordred> that I wanted to check with people about today and then if it's ok with people I'll write it
19:06:31 <clarkb> corvus: https://review.opendev.org/#/c/727907/ its a potential fix for unreachable hosts (basically ignore them). That is a subset of the issues we ran into
19:07:19 <clarkb> I pointed out that may not be safe given that we sometimes do expect things to run in order. However ignornign things like that specifically in base may be ok. My biggest concern with base there would be ignoring a host and not setting up its firewall properly, then that host somehow working a few minutes later when services are configured
19:07:23 <mordred> the idea in a nutshell is - make per-service base playbooks that run the base roles, move the soft-depend from service-playbook->base to service-playbook->service-base - and add appropriate file matchers
19:07:54 <mordred> this allows us to still run base things before services - but limits the success-needed scope to the hosts involved in the given service
19:08:01 <mordred> so if paste is unreachable, we don't skip running zuul
19:08:11 <mordred> but - will WILL skip running paste if paste-base fails
19:08:23 <corvus> mordred: what about running the base role in the normal service playbook, and putting limits on all the service jobs?
19:08:29 <mordred> we do this rather than just putting base roles into every service playbook to avoid no-op installing users for 5 minutes on every service
19:09:36 <corvus> mordred: (ie, "limit: zuul" on run-zuul -- would that waste too much time running the base role?)
19:09:42 <mordred> corvus: those were the two other ideas - base role in normal servicve I don't like because most of the base stuff moves very slowly but does take a non-zero amount of time to do nothing on each service invocation - and with zuul file matchers we can reduce the number of times we run service-base
19:10:21 <corvus> i don't even really know why the base job ran for my change series :(
19:10:22 <mordred> corvus: and yeah - I think we could run base.yaml --limit zuul ... but on friday when we were talking about it it seemed like we might have a harder time understanding exactly what was going on with that approach
19:10:28 <clarkb> my biggest concern with the above proposal is it seems like it could get very complicated quickly
19:10:46 <clarkb> corvus' suggestion is much simpler but might be less efficient
19:10:56 <clarkb> s/might/very likely/
19:11:07 <mordred> there are like 4 proposals - sorry, can we be more specific
19:11:23 <mordred> let me name them:
19:11:57 <mordred> a) ignore unreachable b) service-base c) base-in-service d) --limit
19:12:59 <clarkb> b seems like it will have many complicated file matcher rules and it will be more difficult to understand when things will run. c seems much simpler (that is what my comment above was trying to say)
19:13:26 <mordred> awesome. and yes - I agree - I think c is the simplest thing to understand - although it might be less efficient on a per-service basis
19:13:49 <ianw> clarkb: "would be ignoring a host and not setting up its firewall properly, then that host somehow working a few minutes later when services are configured" ... the intent at least was to ignore the unreachable host (so it would presumably be unreachable for the service playbook); if it was an error that's a different exit code so serivce playbook stops
19:14:05 <corvus> because the file matchers for b would be the union of the matchers for todays "base" and today's "service" playbooks?
19:14:11 <ianw> not that i'm really arguing for a)
19:14:32 <mordred> corvus: I think we could trim down the matchers for today's "base" - they're currently a bit broad
19:14:35 <mordred> but yes
19:14:42 <mordred> well ...
19:14:44 <mordred> no
19:15:11 <mordred> service-base would be a per-service subset of today's base file matchers - but would not need to include a service's file matchers
19:15:31 <mordred> today I think we do base on inventory and playbooks/host_vars
19:15:43 <mordred> if we did per-service base, we could file-matchers on specific host_vars
19:15:46 <mordred> like we do for service
19:15:56 <mordred> but we can also do that with c)
19:15:57 <clarkb> right it would be host_vars/service*.yaml and group_vars/service.yaml rather than host_vars/* group_vars/* sort of thing right?
19:16:04 <mordred> yeah
19:16:19 <corvus> what's the job name for base today?
19:16:28 <corvus> (cause it's not "runbase
19:16:33 <clarkb> infra-prod-run-base iirc
19:16:49 <corvus> no that's "run base on pretend nodes for testing"
19:16:57 <corvus> not "actually run base on real nodes"
19:16:57 <mordred> corvus: infra-prod-base
19:17:33 <mordred> and it's triggered on all of inventory/ playbooks/host_vars/ and playbooks/group_vars/
19:17:43 <mordred> (as well as base.yaml and the base roles)
19:18:00 <corvus> that explains why it ran for my changes; it's hard to change stuff without changing host_vars
19:18:05 <mordred> yeah
19:18:14 <clarkb> thinking out loud here, I think I'd be happy to try b) with a fallback of c) if we find it is too complicated
19:18:27 <clarkb> mostly wanted to call out the potential for complexity early so that we can try and minimize it
19:18:30 <mordred> ++ ... I think the file matchers are going to be very similar for b) and c)
19:18:53 <mordred> how about I try it for one service and we can look at it
19:19:03 <mordred> and if it's too complex, we go with c)
19:19:09 <clarkb> wfm
19:19:18 <corvus> iiuc, b) we'll have a set of like 4-6 file matchers for each service-base job
19:19:24 <corvus> they'll basically look the same
19:19:38 <corvus> except the specific groups and hostvars files/dirs will be service related
19:19:49 <mordred> yeah. and there's not an awesome way to de-duplicate taht
19:19:58 <corvus> so i think it'll be *verbose* but maybe not *complicated*
19:19:59 <ianw> is it correct to say c) means basically every playbooks/service-*.yaml will have a pre_task of include_tasks: base.yaml?
19:20:01 <mordred> corvus: ++
19:20:11 <mordred> ianw: basically yeah
19:20:23 <mordred> corvus: I just had an idea ...
19:20:58 <mordred> corvus: I *think* normal ansible supports roles in subdirs referenced with dot notation... maybe we put our base roles in playbooks/roles/base - so we can make the file matchers list smaller
19:21:47 <mordred> so it would be roles: - base.users
19:22:13 <mordred> I can do a test of that too and see if it works- could allow us to shrink the file matchers
19:22:28 <corvus> sounds good
19:22:58 <corvus> hey
19:23:24 <corvus> i wonder if there's something we can do about the hostvars like that too
19:23:57 <corvus> like, could we segregate the base hostvars in such a way that we can more correctly detect when to run base
19:24:31 <corvus> cause just looking at the roles base runs, they pretty much never change, including the data for them
19:24:49 <mordred> corvus: I think that's a good idea
19:24:51 <corvus> but we need to run the job a lot because of hostvars
19:24:56 <clarkb> corvus: if it recursively looks for matching filenames we might be able to use host_vars/zuul/zuul01.yaml and then just match on host_vars/zuul ?
19:25:03 <clarkb> I have no idea if ansible works that way though
19:25:09 <mordred> I do not think it does
19:25:46 <mordred> oh - you know ...
19:25:47 <corvus> the roles are: users, base-repos, base-server, timezone, unbound, exim, snmpd, iptables
19:25:56 <mordred> yeah
19:26:00 <mordred> remote:   https://review.opendev.org/730937 WIP Move users into a base subdir
19:26:10 <mordred> I just pushed that up to see if subdirs will work for role organizaiton
19:26:28 <mordred> we could maybe make a second hostvar location into which we just put base-associated hostvars
19:26:35 <mordred> maybe inside of the inventory dir
19:26:54 <corvus> yeah, that sounds worth exploring
19:27:20 <mordred> so we could have inventory/host_vars/zuul01.openstack.org.yarml into which we put base hostvars for zuul - and playbooks/host_vars/zuul01.openstack.org.yaml has hostvars for the zuul service playbook
19:27:33 <mordred> but we should write our own plugin that understands .yarml files
19:27:40 <corvus> that's the key
19:27:48 <mordred> I think it's super important
19:27:54 <mordred> similarly ...
19:28:02 <mordred> we could split the inventory into multiple files
19:28:29 <mordred> (we own the plugin that does that anyway)
19:28:36 <clarkb> we tend to not change the inventory super often so that may be a good followin?
19:28:40 <clarkb> *follow on
19:28:40 <mordred> yeah
19:28:52 <mordred> just thinking of ways to simplify matching what to run when
19:29:15 <clarkb> another appraoch could be to avoid having zuul's job descriptions determine the rulse for us
19:29:37 <clarkb> we could have a system that zuul triggered that determined what to execute based on git deltas or similar
19:30:04 <clarkb> (this is me thinking out loud, I Think that would take far more effort to get right since we'd be building that from scratch and in many ways it will probably look like what zuul is already doing)
19:30:07 <mordred> I think that would wind up needing to implement something similar to file matchers
19:30:10 <mordred> yeah
19:30:36 <corvus> yeah, though it might centralize it a bit
19:30:52 <corvus> (we have have a job that decides what child jobs to run)
19:30:55 <corvus> er
19:30:58 <corvus> (we can have have a job that decides what child jobs to run)
19:31:06 <mordred> that's a good point
19:32:02 <corvus> still, i think b and the related things we discussed are a good place to start
19:32:06 <clarkb> as a time check half our hour is gone now. Do we think mordred's plan to try it out on some services is a good place to start or should we discuss this further?
19:32:17 <clarkb> corvus: ++ from me
19:32:19 <corvus> and we can think about a dispatch job if it gets wild
19:32:27 <mordred> I'll get a patch doing b) for service-zuul up after the meeting
19:32:48 <clarkb> mordred: thanks
19:32:54 <clarkb> anything else config management related before we move on?
19:33:15 <mordred> after I get done with that
19:33:20 <mordred> I want to start working on gerrit upgrade
19:33:32 <zbr> super
19:33:45 <clarkb> mordred: exciting, there is a recent thread on gerrit performance post upgrade to notedb that we may want to read over
19:33:51 <clarkb> (as part of upgrade planning)
19:33:54 <mordred> yup. have been reading that
19:34:08 <mordred> some things about frequent aggressive GC
19:34:16 <clarkb> I was somewhat disappointed that some of the response seems to have been "notedb performs less well deal with it"
19:34:42 <mordred> clarkb moore's law will fix
19:35:36 <corvus> yeah, i think notedb perf depends a lot on caching
19:36:40 <clarkb> #topic OpenDev
19:36:54 <clarkb> I've sent out the call for advisory board volunteers.
19:37:00 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000026.html Advisory Board thread. We have some volunteers already!
19:37:21 <clarkb> we've gotten a couple responses so far which is good considering there were holidaysi n many parts of the world recently
19:38:00 * hrw joins
19:39:07 <clarkb> I think the next steps here are to give people enough time to catch up on email, and work through how various groups want to select membership. But in a few weeks I'm hopiong we can select a simple operating mechanism for reaching out between the groups (admins and advisory baord)
19:39:23 <clarkb> I suggested a simple mailing list subject tag, but we'll see if that makes sense once we've got a membership
19:39:49 <clarkb> any questions or concerns on this?
19:40:09 <fungi> the subject tag may also be more trouble than its worth until volume ramps up on that ml (assuming it ever does)
19:40:34 <clarkb> fungi: ya thats possible, I can see it being nice for client side filtration too though
19:40:46 <clarkb> (even if you are getting all of the emails)
19:41:25 <fungi> ot'
19:41:28 <fungi> grr
19:41:37 <fungi> it's also something we could talk about during the ptg
19:41:49 <clarkb> ++
19:42:28 <clarkb> #topic General topics
19:42:40 <clarkb> #topic pip-and-virtualenv next steps
19:43:00 <clarkb> ianw: I kept this on the agenda as I didnt' see announcement of the changes we had planned. Anything we can do to help or is it simlpy a matter of time now?
19:43:19 <ianw> i've been working on the announcement at https://etherpad.opendev.org/p/rm-pip-and-virtualenv
19:43:31 <ianw> it feels way too long
19:44:00 <fungi> maybe the summary needs a summary? ;)
19:44:15 <clarkb> #link https://etherpad.opendev.org/p/rm-pip-and-virtualenv announcement draft for pip and virtualenv changes
19:44:29 <ianw> fungi: yeah, i'll put a tl;dr at the top
19:45:04 <clarkb> sounds like it is moving along then, should we keep this on the agenda for next meeting?
19:45:13 <corvus> maybe something action focused?  like "no action required unless $foo happens in which case you can $bar"
19:45:55 <corvus> which i *think* is the case :)  -- ie, i think we're telling people they shouldn't need to do anything, but jobs are varied, and if something breaks they have options
19:45:55 <ianw> corvus: good point; i thin think the main action will be "if virtualenv is not found, install it"
19:46:40 <clarkb> ++ to giving peopel shortcut to fixes if they have a problem
19:47:12 <ianw> ok, will do, will ping for reviews on opendev at some point, thanks
19:47:57 <clarkb> #topic DNS cleanup and backups
19:48:13 <clarkb> fungi: I didn't end up sharing the dns zone contents with foundation staff. I think you may have, did that happen?
19:48:33 <fungi> yes, they observed there was a lot they could clean up in there, but no concerns publishing the list of records
19:49:35 <ianw> #link https://review.opendev.org/#/c/728739/
19:49:42 <ianw> is the final job that backs up all the RAX domains
19:50:48 <clarkb> cool
19:50:51 <ianw> should we etherpad the openstack.org and we can go through it, maybe put "DELETE" next to things people know can go, then I can clean it up at some point?
19:51:18 <clarkb> ianw: that works for me. We can also share that etherpad with the foundation admins and they can help annotate it too?
19:51:32 <clarkb> though maybe it is better for them to delete the things they know about
19:52:46 <clarkb> #topic Using HTTPS with in region mirrors
19:52:51 <clarkb> moving along as we only have a few minutes left
19:52:56 <clarkb> #link https://review.opendev.org/730861 Test ssl with mirrors via base-test
19:52:57 <ianw> #link https://etherpad.opendev.org/p/rax-dns-openstack-org
19:53:03 <ianw> ^^ to go through
19:53:10 <clarkb> #link https://review.opendev.org/730862 Use ssl with mirrors in production if base-test is happy
19:53:37 <clarkb> I've been pushing this along now that ianw rebuilt all of our mirrors. The big upside to this is we get a bit more assurance that nothing silly is happenign with packages since we don't sign them with reprepro
19:53:49 <clarkb> also with pypi it will be a nice to have too since it basically relies on ssl for all its trust
19:54:14 <clarkb> the firs tchange I've linked will update base-test, we can check bindep and things are happy with it, then roll it out globally
19:54:37 <mordred> ++
19:54:46 <clarkb> #topic Scaling Meetpad/Jitsi Meet
19:55:03 <clarkb> my change to configure a jvb server has landed. I think that means we can deploy a jvb server
19:55:08 <corvus> let's do it
19:55:17 <clarkb> This si somethign I can likely do tomorrow if no one else can do it sooner
19:55:30 <corvus> it'll be good to have that up and running, then we can add others quickly if needed
19:55:46 <clarkb> any new server called jvb01.opendev.org in our inventory should be configured properly (and jvb02, 03, 99 etc)
19:56:01 <fungi> hopefully we don't need more than 100
19:56:27 <fungi> though we probably support an arbitrary number of digits there
19:56:27 <clarkb> I'm totally happy with someone else spinning one up today if they have time. othewise I've got it on the todo list for tomorrow
19:57:28 <clarkb> #topic Project Renames
19:57:40 <clarkb> A reminder that June 12 was pencilled in last week. I think that is still looking ok for me
19:57:46 <fungi> wfm
19:57:57 <clarkb> I'll try to bring it up with the opnstack tc soon so that any other renames they want can be included at that time too
19:58:15 <clarkb> #topic Virtual PTG Attendance
19:58:21 <clarkb> #link https://virtualptgjune2020.eventbrite.com Register if you plan to attend. This helps with planning details.
19:58:28 <clarkb> #link https://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 PTG Ideas
19:58:38 <clarkb> Please register and feel free to add ideas to our planning document
19:58:50 <clarkb> Any thoughts on whether or not we should have this meeting next week?
19:59:22 <clarkb> unlike a normal PTG we won't be distracted by travel and timezones. The PTG times we've requested do not conflict with this meeting
19:59:57 <clarkb> I guess I can show up and if others do too we'll have a meeting.
20:00:08 <corvus> yeah, maybe play it by ear?
20:00:11 <clarkb> I don't have to go out of my way to be around during that time period
20:00:31 <clarkb> also a reminder that the PTG is happening next week
20:00:53 <clarkb> and that takes us to the end of our hour
20:00:56 <clarkb> thank you everyeon
20:00:58 <clarkb> #endmeeting