19:01:18 <clarkb> #startmeeting infra
19:01:19 <openstack> Meeting started Tue Aug 18 19:01:18 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:21 <ianw> o/
19:01:22 <openstack> The meeting name has been set to 'infra'
19:01:32 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000077.html Our Agenda
19:01:39 <clarkb> #topic Announcements
19:01:50 <clarkb> I had no announcements.
19:02:35 <clarkb> #topic Actions from last meeting
19:02:45 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-08-11-19.01.txt minutes from last meeting
19:02:48 <clarkb> There were no actions
19:02:57 <clarkb> #topic Specs approval
19:03:02 <clarkb> #link https://review.opendev.org/#/c/731838/ Authentication broker service
19:03:38 <clarkb> This got a new patchset after soem feedback from corvus. I expect it it still ready for approval
19:03:51 <clarkb> Do we want to put it up for approval this week and get rereviews by say friday?
19:04:24 <fungi> seems like a plan to me
19:04:32 <corvus> ++
19:04:39 <fungi> and yeah, today's update there was just clarification
19:04:51 <fungi> and only touched one paragraph
19:04:59 <clarkb> great. I'll be sure to rereview and plan to approve it friday if there are no objections between now and then
19:05:12 <clarkb> #topic Priority Efforts
19:05:19 <clarkb> #topic Update Config Management
19:05:38 <clarkb> I'm still pushing on the gerrit(bot) things here
19:05:40 <clarkb> #link https://review.opendev.org/746181 Final followup for gerritbot containerization
19:05:53 <clarkb> this change should finish up the remaining todo items for gerritbot's containerization
19:06:05 <clarkb> that ensures we're updating the container image on eavesdrop
19:07:00 <clarkb> #link https://review.opendev.org/#/c/746335/ add missing files to config management
19:07:14 <clarkb> This is a fixup for the ansibleification of gerrit that I noticed when testing gerrit upgrades locally
19:07:33 <clarkb> we stopped managing the logo svg and the jquery js files that hideci uses
19:07:45 <clarkb> #link https://review.opendev.org/746784 More image cleanups
19:08:05 <clarkb> and this last one is where I'm currently at on the gerrit upgrade testing. That should make gerrit startup cleaner. But I want to test it
19:08:19 <clarkb> I've also discovered there is some sort of problem with the gerrit plugin manager on gerrit 3.0 that I haven't figured out yet
19:08:36 <clarkb> thats a bit lower priority as 3.0 happens after 2.16, but I'll still try to sort it out if I can
19:08:48 <clarkb> Any other config management updates to bring up?
19:09:01 <clarkb> fungi: I know you mentioned you wanted to pick up the mirror update reprepro in ansible changes again
19:09:20 <fungi> yeah, i haven't gotten to it yet though
19:09:43 <fungi> if anyone else is excited to work on it though, i don't mind anyone chipping away at the conversion
19:10:00 <fungi> it's just dozens of thankless erb to j2 template conversions
19:11:31 <clarkb> #topic OpenDev
19:12:08 <clarkb> For the Gerrit upgrade process I sent some questions to Luca which resulted in some good info. TL;DR is that we should be able to upgrade to 2.16 without notedb then do the notedb conversion separately
19:12:29 <clarkb> I like this because it breaks the fairly large upgrade into two more manageable pieces
19:13:43 <clarkb> I'm still iterating on our images (as noted above). So far all the upgrades I've been doing have been in succession with online reindexing. Once I've got the image into a happy spot I'm going to start testing skip level type upgrades and see if we can stop gerrit, run 2.14 init, 2.15 init, 2.16 init, reindex, then start as I expect that will end up being the quickest way for us to upgrade if it works
19:14:19 <corvus> this all sounds good to me
19:14:24 <clarkb> but all of that is still an unknown. My goal is to be able to write up an upgrade process to 2.16 using our images that we can then apply to our actual data.
19:15:16 <clarkb> and I'm sure I'll have more questions for luca. Related to that luca offered to do a conference call if we wanted. Are others interested in being included in that? If so let me know and I'll include you for scheduling if/when that happens
19:15:40 <clarkb> I'm also thinking syncing up with luca that way once I've got an upgrade process written down may be good as we can talk about our plan and see if he has any concerns with it
19:16:22 <clarkb> #link https://review.opendev.org/741277 Needed in Gerritlib first as well as a Gerritlib release with this change.
19:16:28 <clarkb> #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change.
19:16:38 <clarkb> two other opendev related chagnes that would be good to review if you get a chance
19:16:47 <clarkb> Anyone else have opendev topics they want to bring up?
19:17:11 <fungi> rackspace volume maintenance maybe
19:17:33 <fungi> just got word today that there will be outages in october for all our current cinder volumes in dfw
19:17:42 <clarkb> did they give specific dates or just that month?
19:17:54 <fungi> no specific dates yet
19:18:06 <fungi> i've converted the uuid list to volume names and broken them down by what we ought to do
19:18:21 <fungi> #link https://etherpad.opendev.org/p/2020-10-rax-dfw-volume-maint October Volume Maintenance
19:19:03 <fungi> the ones in the "migrate" list we want to avoid outages for, and could attach new volumes and pvmove (all new volumes created aren't impacted by the maintenance, only existing volumes)
19:19:39 <fungi> the outage list is those where we could fix them after with modest impact, or maybe turn them off for migration
19:20:05 <fungi> the delete list is there because i noticed we have three which aren't attached ("available" according to cinder) so suggesting we just clean those up
19:20:14 <clarkb> fungi: review.o.o's is in two separate lists
19:20:17 <clarkb> not sure if that was intentional
19:20:48 <corvus> we can migrate it, and then take an outage for extra fun
19:20:56 <clarkb> thank you for putting that together, other than the review.o.o double listing I think that looks good
19:21:02 <fungi> ahh, yeah checking now to see if that matched twice somewhere
19:21:12 <fungi> (i've already cleaned it up on the pad)
19:21:48 <fungi> yeah, that was just a drag-n-drop turning into a cut-n-paste looks like
19:21:58 <fungi> there aren't two volumes with that name
19:22:24 <fungi> anyway, the breakdown there was just my first stab. if folks think we should shuffle anything between migrate and outage feel free
19:23:07 <fungi> and of course we *could* migrate more of them if there's time, but i'd want to prioritize the ones we know would otherwise be painful
19:23:15 <fungi> anyway, that's all i have on that
19:23:20 <clarkb> thanks!
19:23:28 <clarkb> #topic General Topics
19:23:37 <clarkb> #topic Bup and Borg Backups
19:23:49 <clarkb> Doesn't look like the change has merged yet. ianw has been busy with other things
19:23:58 <clarkb> #topic github 3rd party ci
19:24:11 <clarkb> ianw does report that we've hit a speedbump on the arm64 wheel generation problem
19:24:37 <ianw> yeah, i did a bunch of stuff to get us manylinux2014_aarch64 wheels
19:24:37 <fungi> oh, yeah, this is a "fun" (in an unfortunate way) issue
19:24:42 <clarkb> in particular the goal with working with cryptography was to produce manylinux wheels that could be hosted on pypi and help everyone, but they've discovered that ubuntu and centos use different page sizes on arm64
19:25:20 <fungi> with no way to differentiate those for pypi/pip apparently
19:25:21 <clarkb> linux allows for 4k and 64k page sizes on arm64. ubuntu and centos choose differently. It sounds like we may be able to foce 64k for everyone as 4k would still be 64k aligned (but not vice versa)
19:25:38 <corvus> are the pyca folks aware of this now?  (ie, did we at least help them discover/understand this problem?)
19:25:45 <clarkb> but I think that is all work that needs to be done to the upstream python manylinux builder images and from there we can pull it in
19:25:50 <clarkb> corvus: that is my understanding ya
19:26:07 <clarkb> there is commentary through their github issue tracker /me looks for a link
19:26:26 <fungi> manylinux2014 is centos-based right?
19:26:51 <fungi> so basically the idea would be to tweak the manylinux2014 reference to force 64k page size
19:27:15 <clarkb> ya I'm not finding a link, maybe it hasn't gone upstream yet?
19:27:23 <clarkb> fungi: yes its a centos7 in this case iirc
19:27:57 <ianw> there's a few related discussions, the problem was more in libffi as they released a wheel and our ci found it
19:28:24 <clarkb> oh they pushed a wrongly aligned wheel to pypi and then we tried to use it with a different page size? neat
19:28:53 <ianw> yep, it was our wider distro testing that flagged it
19:29:14 <clarkb> anyway Just wanted to call out that progress continues to be made here, and sounds like its ending up as good feedback more globally for arm64 python wheels
19:29:21 <clarkb> ianw: is there anything else you want to call out on this topic?
19:29:33 <fungi> so it sounds like we're uncovering problems which haven't gotten much attention yet, i guess that's a good thing in th elong run
19:29:48 <corvus> yeah, this is a short-term disappointment in the middle of a long-term benefit
19:29:57 <ianw> the other thing is they just (like a few hours ago) switched to travis-ci.com ... which apparently gives them access to whatever aws hardware arm64 thing is
19:30:26 <corvus> so they don't need us anymore?
19:30:31 <ianw> e.g. https://github.com/pyca/cryptography/pull/5416
19:31:18 <ianw> well ... maybe.  it's not 100% clear to me what runs on hardware or not
19:31:42 <clarkb> its also possible that hardware/distro diversity is a good thing here to uncover problems like the page alignment issue
19:32:01 <clarkb> at elast until the arm64 python ecosystem works out those gotchas
19:32:54 <ianw> the other thing was the rust support they're adding
19:33:13 <ianw> #link https://review.opendev.org/746423
19:33:38 <ianw> that adds an ensure-rust role, which worked for upstream jobs (after i figured out where to depends-on for github issues :)
19:34:05 <corvus> who's adding rust support?
19:34:21 <clarkb> corvus: cryptography wants to link to rust as well as C
19:34:28 <corvus> gotcha
19:34:38 <ianw> #link https://github.com/pyca/cryptography/pull/5410
19:34:41 <ianw> corvus: ^
19:35:36 <corvus> btw, do we want to continue leaving comments in the PRs?
19:35:52 <corvus> (we can turn that off now that checks are there; some people like them, some people seem them as spammy)
19:35:59 <ianw> oh, that was another thing, there was some discussions about that
19:36:08 <ianw> yeah, we can turn that off
19:36:24 <corvus> i'm guessing if there were discussions, then there's at least some "these are spammy" sentiment :)
19:37:06 <corvus> should just be a matter of dropping the message stanza from the pipeline
19:37:08 <ianw> #link https://foss.heptapod.net/pypy/cffi/-/issues/468
19:37:17 <ianw> that was the discovery of the page size issues fyi
19:37:21 <ianw> yeah, i can do that
19:37:32 <ianw> the other thing they wanted was a "re-run" button
19:37:41 <ianw> apparently some ci's do that
19:37:46 <clarkb> different than "recheck" comments?
19:37:48 <fungi> rather than leaving a recheck comment
19:37:52 <corvus> i think we can with github checks
19:37:58 <ianw> https://imgur.com/a/ok7WNqs
19:38:26 <ianw> github definitely issues a rerun hook, and we handle it
19:38:46 <clarkb> do we need to change anything then?
19:38:50 <ianw> #link https://developer.github.com/webhooks/event-payloads/#webhook-payload-object
19:39:03 <clarkb> I guess update the trigger config to fire on the rerun call?
19:39:07 <ianw> #link https://developer.github.com/v3/checks/runs/#check-runs-and-requested-actions
19:39:25 <ianw> i'm not sure if *maybe* we need to define the custom button?
19:39:39 <ianw> To create a button that can request additional actions from your app, use the actions object when you Create a check run. For example, the actions object below displays a button in a pull request with the label "Fix this." The button appears after the check run completes.
19:40:21 <clarkb> "use the actions object" <- I guess zuul may need to learn about github actions?
19:40:35 <fungi> oh, hah, so we *can* do it for pyca/cryptography, but projects who want to gate with zuul can't because of the whole apps can't have control over a repo with actions problem?
19:40:56 <fungi> or has that been solved in recent months?
19:41:02 <clarkb> pabelanger would probably know
19:41:17 <ianw> i guess i should probably write a story
19:41:40 <ianw> and then the *other* thing that was brought up was re-running a single job
19:41:57 <corvus> there is existing support in zuul for re-running checks
19:42:10 <ianw> i know we've had that discussion over and over in various ways.  i couldn't find something canonical to point to
19:42:38 <clarkb> ianw: that came up elsewhere recently. I think its the wrong thing for openstack/opendev but can see that being something zuul grows for other use cases
19:43:06 <clarkb> but I also expect that requires significantly more updates to zuul to support
19:43:06 <corvus> ianw: this is the canonical thing to point to: https://zuul-ci.org/docs/zuul/discussion/github-checks-api.html
19:43:45 <corvus> that page also talks about re-run
19:44:10 <corvus> i'd like clarification on clarkb's question -- do we need to change anything?
19:44:16 <corvus> (ie, is re-run not working as expected?)
19:44:59 <clarkb> corvus: looking at that doc maybe our pipeline config to handle the rerun requests?
19:45:08 <clarkb> but it seems like github automatically sets up the desired buttons
19:45:10 <ianw> comment recheck does, but specifically i think they wanted that "re-run" button to appear to be consistent with other ci
19:45:16 <corvus> sure
19:45:30 <corvus> i'm waiting on a clear statement of "the comment button does not appear as expected"
19:45:31 <clarkb> ianw: "Github provides a set of default actions for check suites and check runs. Those actions are available as buttons in the Github UI. Clicking on those buttons will emit webhook events which will be handled by Zuul." is what the zuul doc says
19:45:43 <corvus> er the 'rerun' button
19:45:50 <corvus> because right now, i expect it to appear
19:46:03 <corvus> so i need to understand if there even is a problem
19:46:29 <ianw> well, maybe you want to try and catch reaperhulk into #crytography-dev -- i don't think it appears for non-admin users
19:46:54 <corvus> i think if it appears for admin users, then this is not our problem :)
19:46:56 <fungi> presumably github only shows the re-run widget to users who have permission to trigger it (via whatever acls github enforces on those)?
19:47:20 <ianw> corvus: no i mean he's admin and not seeing it, and i'm not so i think i can't see it in any case
19:47:50 <corvus> note that the docs say it only appears for failing runs
19:47:59 <corvus> (which is, imho, a bad choice on github's part)
19:48:12 <fungi> wow, really?
19:48:19 <fungi> that's an odd decision indeed
19:48:33 <ianw> i think this was in the context of the failing runs from the ffi fallout, but i may be wrong
19:48:39 <corvus> (we recheck successful runs all the time, in fact, i'd argue that's the more legitimate case for rechecking but i'd be arguing with the wind)
19:48:40 <fungi> that really just reinforces the whole "recheck until it passes" mindset too
19:48:46 <corvus> fungi: that
19:49:44 <clarkb> as a time check we have 2 more items to talk about. Maybe we can continue this conversation in #zuul?
19:49:48 <corvus> i don't think i can commit to working with the pyca folks to improve their github experience
19:50:03 <corvus> but atm, i don't think there's anything lacking from zuul in order for it to do what they want
19:50:18 <fungi> but possible some of the github users in #zuul know what the misconfiguration might be there
19:50:25 <corvus> if there even is one
19:50:41 <corvus> let's start with a clear problem statement :)
19:51:35 <clarkb> #topic Making ask.openstack.org read only
19:51:41 <clarkb> #link https://review.opendev.org/#/c/746497/ set ask.openstack.org to read only
19:51:53 <clarkb> we've talked about sunsetting this service for a long time and ttx has written a change to start that process
19:52:09 <clarkb> There is also a openstack-discuss thread on the subject
19:52:23 <clarkb> I don't expect this will get any objections from this group, but wanted to call it out in case there were any concerns
19:52:47 <clarkb> what that chagne should do is make the running service read only and give people a message about it and alternative locations for questions
19:53:09 <clarkb> ianw: ^ you did the last ask deployment so may be able to offer some of the flavor text behind this if people ask
19:53:22 <fungi> it's like the author designed a sunsetting feature right in
19:54:19 <clarkb> #topic PTG Planning
19:54:27 <ianw> clarkb: ^ sure
19:54:38 <fungi> the other concern worth raising is that we likely won't/can't leave it up indefinitely even in a read-only state, as it's complex and unmaintained software and the distro release we're able to deploy it on now is reaching eol in a few months
19:54:44 <clarkb> #undo
19:54:45 <openstack> Removing item from minutes: #topic PTG Planning
19:55:03 <clarkb> fungi: ya maybe we should make that clearer on the thread
19:55:13 <clarkb> basically call out that this is the first step in eventually turning it off completely
19:55:22 <fungi> sgtm
19:55:31 <fungi> i can reply on that thread
19:55:32 <clarkb> #topic PTG Planning
19:55:55 <clarkb> There will be a virtual PTG at the end of October. I think our three blocks of 2 hours across timezone boundaries seemed to work well last time
19:56:08 <corvus> maybe we can point the internet archive crawler at ask after we make it read-only to make sure it gets a complete copy
19:56:15 <clarkb> corvus: ++
19:56:27 <fungi> good idea
19:56:37 <clarkb> #link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here
19:56:48 <clarkb> I've yet to populate that etherpad with ideas, but will do so there when I get some time
19:56:57 <clarkb> feel free to add your own items too
19:57:06 <fungi> clarkb: yeah, i think the vptg worked well, same schedule this time is fine by me
19:57:12 <clarkb> #link https://www.openstack.org/ptg/ Registration is open too
19:57:31 <clarkb> ya I think my biggest question right now is if people think we want more (or less) time?
19:57:33 <fungi> i keep meaning to do that, thanks for the reminder
19:57:55 <clarkb> I'll assume three blocks of 2 hours unless I hear otherwise. I personally think that worked well for us
19:58:40 <clarkb> #topic Open Disucssion
19:58:48 <clarkb> we have about an minute and a half for anything else you'd like to bring uop
20:00:06 <clarkb> I guess that was it. Thank you everyone!
20:00:08 <clarkb> #endmeeting