19:01:02 <clarkb> #startmeeting infra
19:01:03 <openstack> Meeting started Tue Aug 11 19:01:02 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:06 <openstack> The meeting name has been set to 'infra'
19:01:11 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000070.html Our Agenda
19:01:38 <clarkb> #topic Announcements
19:01:56 <clarkb> I'll be out tomorrow in order to get outside and do some socially distant fishing
19:02:06 <ianw> o/
19:02:09 <zbr> o/
19:02:25 <clarkb> may as well take advantage of the early morning schedule for opendev event and get out on the water early too :)
19:02:33 <clarkb> any other announcements?
19:03:03 <frickler> o/
19:03:08 <clarkb> #topic Actions from last meeting
19:03:18 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-08-04-19.01.txt minutes from last meeting
19:03:34 <clarkb> we didn't record any actions, but the week prior had an action for ianw to look into python wheel caching for more than openstack
19:03:38 <clarkb> ianw: ^ anything new to add to that?
19:04:09 <ianw> i haven't pushed any changes, but yes i started playing with the manylinux docker container builders
19:05:00 <ianw> https://github.com/pyca/cryptography/issues/5292#issuecomment-671759306
19:05:38 <ianw> that will be useful for us, but also be something that can be useful for others too.  although how it gets consumed is an open question
19:05:55 <clarkb> ianw: I think a good chunk of the wheels we build assume things about system packages like libvirt? but even that one may be able to be manylinux'd if it can handle many libvirt apis
19:06:16 <clarkb> that said I suspect a number are actually reduceable to many linux
19:06:34 <ianw> yeah, i think think it's an "and" thing rather than an "or"
19:07:36 <ianw> anyway, i got a bit sidetracked into other things as well, so didn't make great progress, but still looking at it
19:07:41 <clarkb> thanks
19:07:43 <clarkb> #topic Specs approval
19:07:51 <clarkb> #link https://review.opendev.org/#/c/731838/ Authentication broker service
19:08:09 <clarkb> I've reviewed the latest patchset for this change and it look sgood to me
19:08:30 <clarkb> but I don't think many others have reviewed it yet. Is this something we think we might put up for approval next week and hopefully do any remaining iteration this week?
19:09:48 <fungi> i wouldn't mind corvus and mordred taking a look through it since they're technically co-authors (i copied and pasted text from some of their e-mails from years past)
19:10:02 <corvus> i'll do that this week
19:10:06 <fungi> thanks!
19:10:23 <clarkb> thank you. I guess we'll see where we end up and possibly will have that up for approval next week
19:10:34 <clarkb> #topic Priority Efforts
19:10:35 <fungi> would just hate to have a spec they sort of co-wrote merge without at least getting skimmed ;)
19:10:45 <clarkb> fungi: ++
19:10:47 <clarkb> #topic Update Config Management
19:11:01 <clarkb> I've been poking at some Gerrit related docker things recently
19:11:17 <clarkb> Gerritbot containerization in progress at https://review.opendev.org/#/c/745573/1 and parents
19:11:24 <clarkb> #link https://review.opendev.org/#/c/745240/ in particular needs second review.
19:11:47 <clarkb> if I can get a second review that is happy with ^ I'll try to land and coordinate those changes with review.o.o and eavesdrop today
19:12:12 <clarkb> the ansible group vars should all be set. Its just a matter of stopping gerritbot on review.o.o and ensuring the process that starts in docker on eavesdrop is happy
19:12:50 <clarkb> I've also been trying to better understand the gerrit upgrade process which has led to a change with image fixes for gerrit
19:12:52 <clarkb> #link https://review.opendev.org/745595 fixes for gerrit plugins on newer gerrit images
19:13:08 <clarkb> that change as is is a bit omnibus like. I'm happy to split it up a bit if reviewers would prefer
19:13:42 <clarkb> mostly what it does is checkout valid versions of plugins so they build properly across the gerrit versions. It also addresses a javamelody is special problem with plugin building
19:14:36 <clarkb> Any other config management items to bring up?
19:15:46 <fungi> that reminds me i need to finish the mirror-update ansibilification for reprepro mirrors
19:16:29 <clarkb> #topic OpenDev
19:17:04 <clarkb> #link https://review.opendev.org/741277 Gerritlib change to support creating projects with non master HEAD
19:17:19 <clarkb> #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change. This change updates jeepyb to toggle that flag
19:17:58 <clarkb> For review-test does anyone understand the state it is in?
19:18:15 <clarkb> ianw: I think you had to disable the upstream trakcing cron because it had filled the disk with logs?
19:18:44 <ianw> yes, but the point was more that ansible wasn't completing on it so i had to do it manually
19:19:08 <clarkb> do we need to add it to the emergency file while we figure out what it needs?
19:19:55 <ianw> umm, i forget now why it wasn't simple to fix
19:21:11 <clarkb> ok, it would probably be a good idea to see if we can keep it from interacting with other production things
19:21:50 <clarkb> from a general upgrade perspective I've been trying to bootstrap myself on the process there so that one can be written down for tesitng on review-test
19:22:03 <ianw> http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-08-04.log.html#t2020-08-04T21:52:30
19:22:04 <clarkb> that is how I found the docker image issues I'm trying to fix in that change above
19:22:09 <zbr> clarkb: i can help with gerritbot, probably gerritlib too,  just add me s reviewer
19:22:16 <ianw> AnsibleUndefinedVariable: 'gerrit_vhost_name' is undefined
19:22:34 <ianw> that's what review-test is failing on, if that rings any bells for anyone
19:22:58 <fungi> maybe it's as simple as just adding a missed hostvar
19:23:12 <clarkb> corvus: one thing in particular that I've learned/discovered is that an upgrade to 2.15 should be really straightforward. Do you know why it was rejected or at least not the preferred route when you and mordred talked to gerrit?
19:23:41 <fungi> though we do set gerrit_vhost_name: review-test.opendev.org in inventory/service/host_vars/review-test.openstack.org.yaml
19:23:47 <clarkb> (I know it doesn't get us to fully notedb, but it is forward progress and likely could be done with minimal downtime, though I've yet to test 2.13 -> 2.15 without 2.14 in the middle)
19:24:28 <ianw> fungi: yeah i think that's about where i got to :)
19:24:52 <corvus> clarkb: one thing i remember is *do not use notedb in 2.15*
19:25:19 <clarkb> corvus: ya the default is notedb is only used for accounts on 2.15 iirc
19:25:20 <corvus> clarkb: so since we can't actually make progress on the conversion until 2.16, would an upgrade to 2.15 get us anything?
19:25:28 <ianw> fungi: yeah, that's right ... it started getting into how we deploy the host vars which got all split up and changed and *that's* why it wasn't trivial
19:25:55 <clarkb> corvus: it gets us momentum? and simplifies the testing process for us to get there as we don't have to do the full notedb profiling (disk use and performance)
19:26:04 <corvus> clarkb: i'm not sure if even that is safe?  i dunno.
19:26:48 <clarkb> maybe that is something we should ask upstream. Mostly I'm thinking that we could get that done relatively quickly while still working towards the 2.16 notedb transition too
19:26:51 <corvus> clarkb: i just remember them saying that notedb in 2.15 is wrong, and doing a conversion in 2.15 would prevent a correct upgrade later.
19:27:18 <corvus> clarkb: like, i don't get what the purpose is?  what is easy in 2.15 that's hard in 2.16?
19:27:33 <clarkb> corvus: 2.15 doesn't notedb for changes
19:27:37 <corvus> from my pov, 2.16 is like 2.15 but without bugs
19:27:44 <clarkb> corvus: thats not how I read their docs
19:27:53 <clarkb> 2.15 is no changes notedb because its buggy
19:28:07 <clarkb> and you keep using the system that worked previously (which is why it is easier, we know the prformance of that)
19:28:20 <corvus> you can do that with 2.16 too, right?
19:28:45 <clarkb> corvus: their docs actively recommend against that (but don't provide a reason), but yes we could go to 2.16 with no notedb then switch to notedb after potentially
19:29:15 <fungi> i guess it's a question of whether they're saying not to run 2.16 without notedb (so prevents us from decoupling the upgrade and db migration steps) or whether it's safe to run with everything still in mysql on 2.16 and then perform the notedb migration
19:29:18 <clarkb> but ya maybe the way to frame it is more around change in notedb rather than specific versions. I think we can relatively easily upgrade to a gerrit version without converting to changes notedb
19:29:44 <clarkb> I think it is the notedb conversion itself that we want to carefully test and profile to ensure we don't run out disk, the performance remains reasonable, and to deal with any replication change requirements
19:29:45 <corvus> clarkb: say you're right: let's say it's easy to upgrade to 2.15 without making any changes.  why would we do it?  it doesn't help the upgrade to anything later.
19:29:57 <corvus> it's also not supported
19:30:05 <corvus> (even 2.16 isn't supported at this point)
19:30:27 <corvus> to me, it just feels like wasting time that could be spent working on an upgrade to 2.16
19:30:30 <clarkb> I think there is value in making some progress
19:30:39 <clarkb> it gets us polygerrit
19:30:42 <corvus> i don't think it's progress
19:30:46 <corvus> it's a bad/wrong polygerrit
19:30:54 <corvus> 2.16 is the polygerrit that we actually want to introduce users to
19:31:01 <zbr> i am inclined to believe doing both would be less work, less risks.
19:31:02 <clarkb> what is bad or wrong about it?
19:31:08 <fungi> another question. if we upgrade to 2.16 and migrate to notedb all in one shot, but subsequently discover issues with it, would we want to try to roll back to not-notedb or roll forward to a polygerrit-only 3.x and hope the issues resolved themselves?
19:31:31 <corvus> i think 2.15 polygerrit is different than 2.16; 2.16 is like what's in 3.x
19:32:39 <clarkb> I'm just looking at the problem in front of us and a big part of it (from my perspective) is its size. It feels huge and breaking it down if we can would help us make progress I think
19:32:52 <clarkb> it isn't perfect, and we should continue to plan to upgrade further
19:32:56 <corvus> clarkb: there is a significant chance (i say this based on past upgrades) that each version upgrade will cause us to burn cycles on version-specific issues.  i'd hate to spend time on a 2.15 specific issue rather than 2.16.
19:33:19 <corvus> clarkb: in general, i agree with small incremental steps.  i just don't see what the incremental step between here and 2.16 is.
19:33:28 <clarkb> corvus: thats fair, maybe we should ask about the risk with not doing the notedb migration immediately on 2.16? then we can continue to split the problems up?
19:34:25 <clarkb> from an operational perspective it would simplify the imgaes we need to manage
19:34:30 <clarkb> we could drop 2 or 3 of them
19:34:30 <corvus> clarkb: luca has offered to help us upgrade, and i think that would be exactly the sort of thing he would welcome as part of that offer
19:34:51 <corvus> (that question)
19:35:05 <clarkb> corvus: thats great to hear, why don't I do a bit more local investigating (there are a few scnearios I wanted to test like skipping versions) then try and compile a couple of questions like that for luca
19:35:12 <corvus> sorry if that wasn't clear; i'm suggesting that asking luca about running 2.16 without notedb is a good q
19:35:16 <clarkb> yup
19:35:30 <corvus> also, the q about 2.15
19:35:48 <fungi> also maybe he can clarify whether running the 2.15 its-storyboard plugin with 3.x is a bad idea
19:36:22 <fungi> since they don't seem to have branched it past 2.15
19:36:30 <corvus> (my guess is he's more likely to agree that upgrading to 2.15 is a good idea)
19:36:59 <corvus> (but he doesn't have my lived experience with what it takes to perform an opendev gerrit upgrade)
19:37:12 <corvus> (he upgrades continually :)
19:37:30 <clarkb> I should be able to have tested our updated images by the end of the week and have a general sense for what is potentially a good idea and what isn't based on simple local testing. Then work up an email for luca
19:38:17 <clarkb> having images that don't spit out tracebacks on startup was step 0 there :)
19:38:27 <clarkb> anything else on opendev?
19:39:11 <clarkb> #topic General Topics
19:39:19 <clarkb> #topic Bup and Borg
19:39:31 <clarkb> ianw I think this is largely a borg topic now. The borg change has the reviews it needs
19:40:01 <ianw> yep, i just need to start a server and get something in to test, still on my short-term TODO sorry
19:40:15 <clarkb> no worries. I think we're all pretty swamped. But wanted ot make sure you saw that
19:40:26 <clarkb> #topic Github 3rd Party CI
19:40:38 <clarkb> how is this going?
19:41:00 <ianw> so OK I think, no complaints
19:41:21 <clarkb> are they consuming it as a CI system yet or are we still in the water temp testing stage?
19:41:28 <fungi> praise would be better, but i'll take no complaints ;)
19:41:28 <ianw> next thing for pyca/cryptography we should enable it for master commits, as well as pull requests
19:41:46 <ianw> the project has a .zuul.d directory committed, so that's good :)
19:42:29 <ianw> so probably the next thing is to see if we can fit into wheel generation somewhere, as described previously
19:43:38 <ianw> i got some private communication that libxml was also in need of similar arm64 resources
19:44:44 <clarkb> lxml or libxml?
19:45:03 <clarkb> but ya it wouldn't surprise me if there is a similar need for many of those more costly python packages we've seen end up in the openstack wheel cache
19:45:52 <clarkb> anything else on this topic?
19:47:06 <clarkb> #topic Open Discussion
19:47:13 <ianw> lxml sorry
19:47:25 <fungi> yeah, lxml links libxml when building
19:47:27 <clarkb> we have a bit of time for any other items that are shareable
19:47:46 <fungi> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016424.html dates for ptg have been firmed up (it's the week after the summit)
19:49:07 <clarkb> oh I had missed that
19:49:24 <clarkb> the 6 hours we had last time seemed to work well so I'll probably schedule a similar block.
19:50:04 <clarkb> also we'll likely want to double check meetpad is still happy a few weeks prior to that (iirc we auto update the images so we may pull in new things we need to accomodate)
19:51:14 <corvus> my patch isn't merged upstream, so the web server container is still pinned
19:51:45 <clarkb> we should also consider scaling it up again
19:52:03 <clarkb> though we probably only need ~2 extra servers this time based on load last time (really the bottleneck most people saw eemed to be in the browser)
19:54:36 <clarkb> sounds like that may be it. Thank you everyone. We'll see you here next week.
19:54:40 <clarkb> #endmeeting