19:01:16 #startmeeting infra 19:01:16 Meeting started Tue May 26 19:01:16 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:19 The meeting name has been set to 'infra' 19:01:44 #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000029.html Our Agenda 19:01:46 o/ 19:01:51 #topic Announcements 19:02:01 The PTG is running next week 19:02:08 o/ 19:02:17 o/ 19:02:19 we'll talk more about that later in the meeting, but wanted to make sure people were aware that was going to happen next week 19:02:46 #topic Actions from last meeting 19:02:52 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-05-19-19.01.txt minutes from last meeting 19:03:05 There were no recorded actions, though I think a fair bit has happened so lets continue and discuss 19:03:10 #topic Priority Efforts 19:03:15 #topic Update Config Management 19:03:40 mordred: corvus: fungi: I think we have learned new things about using Zuul for CD thursday-friday ish? 19:04:13 specifically how requiring base and things like LE before running other services can have cascading effects. This is something ianw has pointed out before and has proposed a potential workaround 19:05:11 a new insight is that since sometimes we have the run-zuul job run without the other base jobs, then we can short-circuit the protections afforded by running it in sequence after the base jobs 19:05:24 yeah. that was less awesome 19:05:48 clarkb: what's the ianw fix? 19:05:52 clarkb: I've also got an idea about splitting base into per-service-base 19:05:54 also, mordred had a proposal he was noodling on 19:06:06 that I wanted to check with people about today and then if it's ok with people I'll write it 19:06:31 corvus: https://review.opendev.org/#/c/727907/ its a potential fix for unreachable hosts (basically ignore them). That is a subset of the issues we ran into 19:07:19 I pointed out that may not be safe given that we sometimes do expect things to run in order. However ignornign things like that specifically in base may be ok. My biggest concern with base there would be ignoring a host and not setting up its firewall properly, then that host somehow working a few minutes later when services are configured 19:07:23 the idea in a nutshell is - make per-service base playbooks that run the base roles, move the soft-depend from service-playbook->base to service-playbook->service-base - and add appropriate file matchers 19:07:54 this allows us to still run base things before services - but limits the success-needed scope to the hosts involved in the given service 19:08:01 so if paste is unreachable, we don't skip running zuul 19:08:11 but - will WILL skip running paste if paste-base fails 19:08:23 mordred: what about running the base role in the normal service playbook, and putting limits on all the service jobs? 19:08:29 we do this rather than just putting base roles into every service playbook to avoid no-op installing users for 5 minutes on every service 19:09:36 mordred: (ie, "limit: zuul" on run-zuul -- would that waste too much time running the base role?) 19:09:42 corvus: those were the two other ideas - base role in normal servicve I don't like because most of the base stuff moves very slowly but does take a non-zero amount of time to do nothing on each service invocation - and with zuul file matchers we can reduce the number of times we run service-base 19:10:21 i don't even really know why the base job ran for my change series :( 19:10:22 corvus: and yeah - I think we could run base.yaml --limit zuul ... but on friday when we were talking about it it seemed like we might have a harder time understanding exactly what was going on with that approach 19:10:28 my biggest concern with the above proposal is it seems like it could get very complicated quickly 19:10:46 corvus' suggestion is much simpler but might be less efficient 19:10:56 s/might/very likely/ 19:11:07 there are like 4 proposals - sorry, can we be more specific 19:11:23 let me name them: 19:11:57 a) ignore unreachable b) service-base c) base-in-service d) --limit 19:12:59 b seems like it will have many complicated file matcher rules and it will be more difficult to understand when things will run. c seems much simpler (that is what my comment above was trying to say) 19:13:26 awesome. and yes - I agree - I think c is the simplest thing to understand - although it might be less efficient on a per-service basis 19:13:49 clarkb: "would be ignoring a host and not setting up its firewall properly, then that host somehow working a few minutes later when services are configured" ... the intent at least was to ignore the unreachable host (so it would presumably be unreachable for the service playbook); if it was an error that's a different exit code so serivce playbook stops 19:14:05 because the file matchers for b would be the union of the matchers for todays "base" and today's "service" playbooks? 19:14:11 not that i'm really arguing for a) 19:14:32 corvus: I think we could trim down the matchers for today's "base" - they're currently a bit broad 19:14:35 but yes 19:14:42 well ... 19:14:44 no 19:15:11 service-base would be a per-service subset of today's base file matchers - but would not need to include a service's file matchers 19:15:31 today I think we do base on inventory and playbooks/host_vars 19:15:43 if we did per-service base, we could file-matchers on specific host_vars 19:15:46 like we do for service 19:15:56 but we can also do that with c) 19:15:57 right it would be host_vars/service*.yaml and group_vars/service.yaml rather than host_vars/* group_vars/* sort of thing right? 19:16:04 yeah 19:16:19 what's the job name for base today? 19:16:28 (cause it's not "runbase 19:16:33 infra-prod-run-base iirc 19:16:49 no that's "run base on pretend nodes for testing" 19:16:57 not "actually run base on real nodes" 19:16:57 corvus: infra-prod-base 19:17:33 and it's triggered on all of inventory/ playbooks/host_vars/ and playbooks/group_vars/ 19:17:43 (as well as base.yaml and the base roles) 19:18:00 that explains why it ran for my changes; it's hard to change stuff without changing host_vars 19:18:05 yeah 19:18:14 thinking out loud here, I think I'd be happy to try b) with a fallback of c) if we find it is too complicated 19:18:27 mostly wanted to call out the potential for complexity early so that we can try and minimize it 19:18:30 ++ ... I think the file matchers are going to be very similar for b) and c) 19:18:53 how about I try it for one service and we can look at it 19:19:03 and if it's too complex, we go with c) 19:19:09 wfm 19:19:18 iiuc, b) we'll have a set of like 4-6 file matchers for each service-base job 19:19:24 they'll basically look the same 19:19:38 except the specific groups and hostvars files/dirs will be service related 19:19:49 yeah. and there's not an awesome way to de-duplicate taht 19:19:58 so i think it'll be *verbose* but maybe not *complicated* 19:19:59 is it correct to say c) means basically every playbooks/service-*.yaml will have a pre_task of include_tasks: base.yaml? 19:20:01 corvus: ++ 19:20:11 ianw: basically yeah 19:20:23 corvus: I just had an idea ... 19:20:58 corvus: I *think* normal ansible supports roles in subdirs referenced with dot notation... maybe we put our base roles in playbooks/roles/base - so we can make the file matchers list smaller 19:21:47 so it would be roles: - base.users 19:22:13 I can do a test of that too and see if it works- could allow us to shrink the file matchers 19:22:28 sounds good 19:22:58 hey 19:23:24 i wonder if there's something we can do about the hostvars like that too 19:23:57 like, could we segregate the base hostvars in such a way that we can more correctly detect when to run base 19:24:31 cause just looking at the roles base runs, they pretty much never change, including the data for them 19:24:49 corvus: I think that's a good idea 19:24:51 but we need to run the job a lot because of hostvars 19:24:56 corvus: if it recursively looks for matching filenames we might be able to use host_vars/zuul/zuul01.yaml and then just match on host_vars/zuul ? 19:25:03 I have no idea if ansible works that way though 19:25:09 I do not think it does 19:25:46 oh - you know ... 19:25:47 the roles are: users, base-repos, base-server, timezone, unbound, exim, snmpd, iptables 19:25:56 yeah 19:26:00 remote: https://review.opendev.org/730937 WIP Move users into a base subdir 19:26:10 I just pushed that up to see if subdirs will work for role organizaiton 19:26:28 we could maybe make a second hostvar location into which we just put base-associated hostvars 19:26:35 maybe inside of the inventory dir 19:26:54 yeah, that sounds worth exploring 19:27:20 so we could have inventory/host_vars/zuul01.openstack.org.yarml into which we put base hostvars for zuul - and playbooks/host_vars/zuul01.openstack.org.yaml has hostvars for the zuul service playbook 19:27:33 but we should write our own plugin that understands .yarml files 19:27:40 that's the key 19:27:48 I think it's super important 19:27:54 similarly ... 19:28:02 we could split the inventory into multiple files 19:28:29 (we own the plugin that does that anyway) 19:28:36 we tend to not change the inventory super often so that may be a good followin? 19:28:40 *follow on 19:28:40 yeah 19:28:52 just thinking of ways to simplify matching what to run when 19:29:15 another appraoch could be to avoid having zuul's job descriptions determine the rulse for us 19:29:37 we could have a system that zuul triggered that determined what to execute based on git deltas or similar 19:30:04 (this is me thinking out loud, I Think that would take far more effort to get right since we'd be building that from scratch and in many ways it will probably look like what zuul is already doing) 19:30:07 I think that would wind up needing to implement something similar to file matchers 19:30:10 yeah 19:30:36 yeah, though it might centralize it a bit 19:30:52 (we have have a job that decides what child jobs to run) 19:30:55 er 19:30:58 (we can have have a job that decides what child jobs to run) 19:31:06 that's a good point 19:32:02 still, i think b and the related things we discussed are a good place to start 19:32:06 as a time check half our hour is gone now. Do we think mordred's plan to try it out on some services is a good place to start or should we discuss this further? 19:32:17 corvus: ++ from me 19:32:19 and we can think about a dispatch job if it gets wild 19:32:27 I'll get a patch doing b) for service-zuul up after the meeting 19:32:48 mordred: thanks 19:32:54 anything else config management related before we move on? 19:33:15 after I get done with that 19:33:20 I want to start working on gerrit upgrade 19:33:32 super 19:33:45 mordred: exciting, there is a recent thread on gerrit performance post upgrade to notedb that we may want to read over 19:33:51 (as part of upgrade planning) 19:33:54 yup. have been reading that 19:34:08 some things about frequent aggressive GC 19:34:16 I was somewhat disappointed that some of the response seems to have been "notedb performs less well deal with it" 19:34:42 clarkb moore's law will fix 19:35:36 yeah, i think notedb perf depends a lot on caching 19:36:40 #topic OpenDev 19:36:54 I've sent out the call for advisory board volunteers. 19:37:00 #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000026.html Advisory Board thread. We have some volunteers already! 19:37:21 we've gotten a couple responses so far which is good considering there were holidaysi n many parts of the world recently 19:38:00 * hrw joins 19:39:07 I think the next steps here are to give people enough time to catch up on email, and work through how various groups want to select membership. But in a few weeks I'm hopiong we can select a simple operating mechanism for reaching out between the groups (admins and advisory baord) 19:39:23 I suggested a simple mailing list subject tag, but we'll see if that makes sense once we've got a membership 19:39:49 any questions or concerns on this? 19:40:09 the subject tag may also be more trouble than its worth until volume ramps up on that ml (assuming it ever does) 19:40:34 fungi: ya thats possible, I can see it being nice for client side filtration too though 19:40:46 (even if you are getting all of the emails) 19:41:25 ot' 19:41:28 grr 19:41:37 it's also something we could talk about during the ptg 19:41:49 ++ 19:42:28 #topic General topics 19:42:40 #topic pip-and-virtualenv next steps 19:43:00 ianw: I kept this on the agenda as I didnt' see announcement of the changes we had planned. Anything we can do to help or is it simlpy a matter of time now? 19:43:19 i've been working on the announcement at https://etherpad.opendev.org/p/rm-pip-and-virtualenv 19:43:31 it feels way too long 19:44:00 maybe the summary needs a summary? ;) 19:44:15 #link https://etherpad.opendev.org/p/rm-pip-and-virtualenv announcement draft for pip and virtualenv changes 19:44:29 fungi: yeah, i'll put a tl;dr at the top 19:45:04 sounds like it is moving along then, should we keep this on the agenda for next meeting? 19:45:13 maybe something action focused? like "no action required unless $foo happens in which case you can $bar" 19:45:55 which i *think* is the case :) -- ie, i think we're telling people they shouldn't need to do anything, but jobs are varied, and if something breaks they have options 19:45:55 corvus: good point; i thin think the main action will be "if virtualenv is not found, install it" 19:46:40 ++ to giving peopel shortcut to fixes if they have a problem 19:47:12 ok, will do, will ping for reviews on opendev at some point, thanks 19:47:57 #topic DNS cleanup and backups 19:48:13 fungi: I didn't end up sharing the dns zone contents with foundation staff. I think you may have, did that happen? 19:48:33 yes, they observed there was a lot they could clean up in there, but no concerns publishing the list of records 19:49:35 #link https://review.opendev.org/#/c/728739/ 19:49:42 is the final job that backs up all the RAX domains 19:50:48 cool 19:50:51 should we etherpad the openstack.org and we can go through it, maybe put "DELETE" next to things people know can go, then I can clean it up at some point? 19:51:18 ianw: that works for me. We can also share that etherpad with the foundation admins and they can help annotate it too? 19:51:32 though maybe it is better for them to delete the things they know about 19:52:46 #topic Using HTTPS with in region mirrors 19:52:51 moving along as we only have a few minutes left 19:52:56 #link https://review.opendev.org/730861 Test ssl with mirrors via base-test 19:52:57 #link https://etherpad.opendev.org/p/rax-dns-openstack-org 19:53:03 ^^ to go through 19:53:10 #link https://review.opendev.org/730862 Use ssl with mirrors in production if base-test is happy 19:53:37 I've been pushing this along now that ianw rebuilt all of our mirrors. The big upside to this is we get a bit more assurance that nothing silly is happenign with packages since we don't sign them with reprepro 19:53:49 also with pypi it will be a nice to have too since it basically relies on ssl for all its trust 19:54:14 the firs tchange I've linked will update base-test, we can check bindep and things are happy with it, then roll it out globally 19:54:37 ++ 19:54:46 #topic Scaling Meetpad/Jitsi Meet 19:55:03 my change to configure a jvb server has landed. I think that means we can deploy a jvb server 19:55:08 let's do it 19:55:17 This si somethign I can likely do tomorrow if no one else can do it sooner 19:55:30 it'll be good to have that up and running, then we can add others quickly if needed 19:55:46 any new server called jvb01.opendev.org in our inventory should be configured properly (and jvb02, 03, 99 etc) 19:56:01 hopefully we don't need more than 100 19:56:27 though we probably support an arbitrary number of digits there 19:56:27 I'm totally happy with someone else spinning one up today if they have time. othewise I've got it on the todo list for tomorrow 19:57:28 #topic Project Renames 19:57:40 A reminder that June 12 was pencilled in last week. I think that is still looking ok for me 19:57:46 wfm 19:57:57 I'll try to bring it up with the opnstack tc soon so that any other renames they want can be included at that time too 19:58:15 #topic Virtual PTG Attendance 19:58:21 #link https://virtualptgjune2020.eventbrite.com Register if you plan to attend. This helps with planning details. 19:58:28 #link https://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 PTG Ideas 19:58:38 Please register and feel free to add ideas to our planning document 19:58:50 Any thoughts on whether or not we should have this meeting next week? 19:59:22 unlike a normal PTG we won't be distracted by travel and timezones. The PTG times we've requested do not conflict with this meeting 19:59:57 I guess I can show up and if others do too we'll have a meeting. 20:00:08 yeah, maybe play it by ear? 20:00:11 I don't have to go out of my way to be around during that time period 20:00:31 also a reminder that the PTG is happening next week 20:00:53 and that takes us to the end of our hour 20:00:56 thank you everyeon 20:00:58 #endmeeting