19:01:04 <clarkb> #startmeeting infra
19:01:05 <openstack> Meeting started Tue Jul 16 19:01:04 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:08 <openstack> The meeting name has been set to 'infra'
19:01:40 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-July/006417.html
19:01:46 <clarkb> Find our agenda at ^
19:01:53 <clarkb> #topic Announcements
19:02:04 <ianw> o/
19:02:14 <clarkb> Nothing major. As mentioned previously I'm largely afk today visiting people in town for oscon
19:02:33 <corvus> o/
19:02:49 <clarkb> #topic Actions from last meeting
19:02:55 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-07-09-19.01.txt minutes from last meeting
19:03:03 <clarkb> #action mordred create opendevadmin github account
19:03:08 <clarkb> #action mordred clean up openstack-infra github org
19:03:21 <clarkb> mordred has apparently started on this second item and is writing a currently non working script to do the work
19:04:09 <clarkb> #topic Priority Efforts
19:04:16 <fungi> hopefully he's writing a working script, and its non-workingness is merely a transient state
19:04:24 <clarkb> fungi: that was how I understood it :)
19:04:29 <clarkb> #topic Update Config Management
19:05:14 <clarkb> has anyone noticed yet the shiny job running optomization for system-config when you update it's .zuul.yaml?
19:05:57 <clarkb> Other than ^ I'm not sure we've made much progresson this item in the last week. Anyone have items to share related to this?
19:06:12 <fungi> while i'm loathe to admit it, i haven't had occasion to update system-config's .zuul.yaml yet
19:06:19 <corvus> i think the gitea repo creation is ready to merge
19:06:20 <fungi> (since the changes took effect)
19:06:52 <clarkb> corvus: do we need to squash any of the changes together because the chagnes only fully work near the end of the stack?
19:07:04 <corvus> that will unblock the next steps in the zuul-and-related-systems playbook
19:07:13 <clarkb> (or an alternative would be to disable ansible for a bit, get everything merged, then turn ansible on
19:07:35 <corvus> clarkb: i think we can merge them as-is, the worst that can happen if the broken version runs is that it immediately errors out
19:07:54 <clarkb> gotcha so not failing and doing the wrong thing but failing safe. perfect
19:08:17 <corvus> i'll start that going right now
19:08:20 <clarkb> The changes that I have reviewed so far look great (I need to rereview the parallelization change)
19:08:25 <corvus> (since they have sufficient +2s)
19:09:13 <clarkb> as an added bonus I find the python script much more readable than the ansible yaml for this type of stuff (particularly in the manipulation of text/json)
19:09:24 <corvus> :( i liked the yaml
19:09:47 <corvus> clarkb: also, wait till you see the parallelized version before you say it's readable ;)
19:09:48 <clarkb> I always find it hard to reason about the loop constructs with item and subelements...
19:10:17 <corvus> but yeah, we certainly wouldn't be doing *that* in ansible
19:10:22 <clarkb> in any case it is much faster: about half an hour serialized instead of ~4 hours
19:10:35 <corvus> and i'm aiming for ~10m with the updates
19:10:37 <clarkb> and down to ~9 minutes if done in parallel? great improvement
19:10:44 <fungi> i similarly find it hard to reason about threads and mutexes
19:10:51 <fungi> so ymmv ;)
19:11:11 <corvus> 0:13:43
19:11:19 <corvus> is how long it just took
19:11:31 <corvus> so that node is a little slower than my workstation :)
19:11:49 <clarkb> That is probably a good transition into opendev topics
19:11:58 <clarkb> #topic OpenDev
19:11:58 <fungi> quite the speedup!
19:11:58 <corvus> still, all things considered, i think it probably means we can keep using the real data in the test jobs
19:12:12 <clarkb> corvus: nice
19:12:24 <clarkb> corvus: and at that speed can probably run it twice (to check the nooping too
19:12:37 <corvus> yeah, that should be very fast
19:12:54 <clarkb> This related to opendev because this is sort of the very first step in improving project creation work that we discussed at the PTG
19:13:31 <clarkb> We'd like to get to a place for orgs can largely self manage the proejcts under their umbrellas and making sure that doesn't take hours to reconcile is the very first step in that
19:13:35 <clarkb> so yay
19:14:17 <clarkb> Next we can have zuul trigger updates, then we can start distributing responsibility for managing those updates into the orgs ya?
19:14:51 <corvus> yep
19:15:21 <fungi> also we can thoroughly test changes to those mechanisms in a timely manner
19:15:59 <fungi> (in addition to speeding up gitea server replacements)
19:16:19 <fungi> so much win
19:16:27 <clarkb> fungi: those are actually done via database recovery currently to preserve the redirects
19:16:32 <clarkb> fungi: so that doesn't take very long
19:16:42 <fungi> ahh, right, good point
19:16:43 <clarkb> (we have docs on that)
19:16:50 <corvus> though we have plans for doing it from scratch
19:16:52 <fungi> though we do have the redirects recorded in yaml
19:16:58 <corvus> so we may get to that point in the future
19:17:07 <clarkb> yup eventually we should eb able to make it more automated via direct restoration
19:17:09 <fungi> should we need to rebuild them
19:17:31 <clarkb> The other opendev/gitea item I wanted to bring up was the one from today with the OOMing
19:18:04 <clarkb> Our gitea servers don't have swap and under some circumstances (this needs further investigation) they OOM which can kill git which can prevent replication of refs
19:18:39 <clarkb> I've got https://review.opendev.org/#/c/671102/ pushed up and if yall can take a quick look at that and decide it mostly does the right thing I can run that manually on gitea06. The reason for doing it on 06 is 06 has much more disk than the other gitea servers which unfortunately don't have much extra to spare
19:18:39 <fungi> i am mildly concerned that one client can basically cause an arbitrary gitea backend to cancel random git processes
19:18:50 <fungi> but that ought to help
19:19:02 <clarkb> and ya we'll need to investigate further to see what is causing that and hopefully work to fix it in gitea
19:19:29 <clarkb> Anything else opendev related or should we move on?
19:19:57 <fungi> cacti says outbound bandwidth spiked to at least 150mbps when this happened
19:20:09 <fungi> where the baseline was around 10mbps
19:20:27 <fungi> so maybe client rate limits could help, if haproxy has those
19:20:31 <clarkb> journald should have logs of what requests we made then right?
19:20:36 <clarkb> corvus: ^ does gitea log that information?
19:20:38 <fungi> probably
19:20:51 <fungi> well, apache will, presumable?
19:20:56 <fungi> er, presumably
19:21:02 <corvus> i think gitea has it
19:21:04 <clarkb> we don't run apache with gitea
19:21:13 <corvus> i don't think we have an apache (though we talked about it; we can add one if needed)
19:21:21 <corvus> but right now, our setup is simple enough we can do without
19:21:27 <fungi> oh, got it, so it's haproxy straight to the gitea sockets
19:21:37 <corvus> yep, and gitea is terminating the tls
19:21:46 <fungi> and haproxy won't know the nature of the requests because of not terminating ssl/tls
19:22:26 <clarkb> fungi: it should know l3 information which may be sufficient for rate limiting (though that may make NAT users sad)
19:22:59 <fungi> right, just won't tell us what they were requesting that might be so voluminous (maybe they were just recloning every repo though)
19:23:31 <corvus> pretty sure gitea is logging the reqs
19:23:38 <clarkb> cool so we can look into it further than
19:23:41 <clarkb> s/than/then
19:23:57 <fungi> and docker-compose can spit those out?
19:24:03 <clarkb> The other thing I've done to help temporarily is triggered replication on against all gitea backends (currently on 06)
19:24:14 <clarkb> fungi: ya or the docker command
19:24:20 <clarkb> (and possibly journalctl too)
19:24:21 <fungi> cool, thanks
19:24:47 <clarkb> The extra replications are ensuring that all refs are in place which may have been OOMKilled
19:25:46 <fungi> seems like a reasonable precaution until we get this under control
19:26:14 <clarkb> Sounds like we have rough ideas of how to address this further so lets move on
19:26:17 <clarkb> storyboard time
19:26:20 <clarkb> #topic Storyboard
19:26:31 <clarkb> diablo_rojo_phon is at OSCON so unsure if paying attention today
19:26:38 <clarkb> fungi: any new news
19:27:14 <fungi> we're going to try to do some story feature request subclassification on friday
19:27:27 <fungi> but other than that, no major news i'm aware of
19:28:33 <clarkb> #topic General Topics
19:28:42 <clarkb> first up is trusty server upgrades
19:28:49 <clarkb> fungi: any luck with the wiki git repo situation?
19:28:59 <clarkb> I meant to help dig into that then got sucked into new cloud stuff
19:29:08 <clarkb> link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:29:46 <clarkb> I think the new cloud stuff is largely under control/done so I can help with this tomorrow if still a problem
19:29:48 <fungi> i haven't had a chance to dig into what might be causing it. weird though... basically files like /srv/mediawiki/w/extensions/Renameuser/.git which just contain "gitdir: ../../.git/modules/extensions/Renameuser"
19:30:06 <fungi> is that what submodules look like?
19:30:25 <fungi> seems like the reverse of a submodule
19:30:36 <fungi> or maybe not
19:30:38 <clarkb> maybe it is subtree? which is like submodules in practice but implemeted differently
19:30:56 <clarkb> but ya my hunch would be somethign around submodules
19:31:04 <corvus> i was just thinking git needed another way to be used wrong
19:31:21 <fungi> oh! we are cloning these into another git worktree, so... maybe!
19:31:30 <clarkb> corvus: I think they set out to make submodules work better :) and well ya
19:31:54 <fungi> i think the repo we're cloning to /srv/mediawiki/w/ may include files like extensions/Renameuser/.git
19:32:13 <fungi> i'll pursue that avenue
19:32:32 <clarkb> cool let me know if I can help tomorrow
19:32:39 <clarkb> Next up is New cloud updates
19:32:47 <clarkb> Wanted to mention that fortnebula is now in full use
19:32:53 <clarkb> thank you donnyd for getting that set up
19:33:28 <clarkb> In the process we fixed some issues with our cloud launcher config, problems in glean/dib with centos/fedora network manager compatibility and probably other things I'm forgetting now
19:33:45 <clarkb> Hopefully that makes it easier to set up the MOC and linaro resources that we expect will be coming up soon
19:34:01 <clarkb> For linaro Kevin Zhang has reached out to me and says they are starting to build an oepnstack arm cloud on packethost
19:34:18 <clarkb> and will get in touch when that is working and they ahve sorted out IP addressing (I gave them our IP addr requirements)
19:34:29 <clarkb> corvus: mordred isn't here, do you know what is going on with MOC?
19:34:56 <fungi> or knikolla?
19:35:07 <knikolla> o/
19:36:10 <clarkb> just curious if there was anything more to share today since it was mentioned last week
19:36:13 <fungi> knikolla: mordred mentioned you were looking at providing moc resources to opendev's nodepool. have any details/updates?
19:36:15 <clarkb> if not I'll wait patiently :)
19:36:41 <donnyd> clarkb: I still have 40% more to go, just waiting on more gear
19:36:56 <corvus> knikolla: i think mordred made an account and maybe next thing is to add the second project to that?
19:36:57 <donnyd> I plan to have 100 builders
19:36:59 <knikolla> i remember seeing a request for an account on our cloud from mordred, i assume someone on our side approved that
19:37:07 <clarkb> donnyd: awesome
19:37:08 <fungi> ahh, cool
19:37:13 <fungi> thanks knikolla!
19:37:54 <knikolla> if more than one project is needed, mordred can apply for another in the same way
19:38:14 <clarkb> ya typically we use two so that we can separate untrusted test nodes from trusted services like the mirror
19:38:23 <clarkb> good to know the process is the same just repeated
19:38:44 <corvus> cool, sounds like progress and we should check with mordred
19:38:48 <knikolla> cool, does nodepool support application credentials?
19:39:29 <clarkb> knikolla: it supports authentication via anything configurable through clouds.yaml
19:39:31 <clarkb> so maybe?
19:39:55 <knikolla> then most probably yes.
19:39:56 <cmurphy> clouds.yaml supports app creds
19:40:35 <clarkb> ok sounds like next step is sync with mordred, thanks
19:40:43 <clarkb> Next on the agenda is managing our PPA
19:41:04 <clarkb> I've never personally had to manage a PPA so not sure I should lead this discussion but it was mentioned last week that it would be good for us to formalize the process a bit more
19:41:13 <clarkb> maybe that means docs or full on automation or some balance in between
19:41:28 <clarkb> ianw: ^ you've done much of the work there so probably have thoughts
19:41:34 <ianw> yesterday I wrote up some docs at
19:41:37 <ianw> #link https://review.opendev.org/#/c/670952/
19:42:06 <ianw> if people want to make roles/jobs to build and sign debs and put them into the ppas with dput or whatever, that seems just fine
19:42:40 <clarkb> Thank you for the docs that seems like a good place to start
19:42:56 <clarkb> corvus maybe you can review that change and think about whether you'd like more (like automation)
19:42:57 <ianw> i would say, for things like the afs work ... you're taking an upstream tarball and then shoe-horning it into the existing (older) debian package.  it's a lot of manual fiddling, and you've got to have a pretty good idea of how debian packaging works
19:43:04 <corvus> if ianw is away, and someone says "we need a new openafs package" i'd like to know what to do...
19:43:24 <clarkb> hrm ya in that case automating the shoehorning might be a good idea
19:43:42 <clarkb> maybe thats jsut a script we can run periodically?
19:44:23 <corvus> i mean, i don't need a general guide, it's more, specifically, what commands do i run to make an updated openafs package?
19:44:56 <corvus> (that could be a job, i'm ambivalent about that)
19:46:01 <corvus> we actually have precedent for this: https://docs.openstack.org/infra/system-config/nodepool.html#vhd-util
19:46:34 <clarkb> gotcha so its that bit of documentation. FWIW I find the other bit ianw wrote useful too (particularly the stuff around perms)
19:46:46 <clarkb> ianw: so maybe we can add something like the vhd package build docs too and start from there?
19:47:13 <corvus> the equivalent of that for openafs i think would be sufficient
19:47:31 <fungi> if the openafs source package includes a debian/watch file it may be as simple as running uupdate in the source tree
19:48:31 <clarkb> that seems like a reasonable path forward. We only have 12 minutes left so I'll continue as the last item is fairly important too
19:48:47 <ianw> fungi: it can be, but then we've also done quite a lot of backporting at times for things like arm64 support.  the complexity definitely varies based on what is being done.
19:48:57 <corvus> (also, is that vhd-util stuff completely obsolete now?)
19:49:07 <clarkb> corvus: no we still use it to build rax images
19:49:08 <fungi> given rra used to handle the openafs uploads for debian, the package is probably fairly easy to update like that
19:49:32 <fungi> and yeah, patching is always going to be the gotcha
19:49:52 <fungi> though you can probably drop diffs into the debian/patches/ directory
19:50:11 <clarkb> We have been asked if we would like to attend the PTG as a team. I know I'll be going and expect fungi to be there as well and think mordred and corvus are going too. A rough headcount will be helpful for planning purposes but I think even with a small group we should have a chunk of time to work on opendev type things (this was useful in denver)
19:50:32 <corvus> ++
19:50:33 <clarkb> Also I know that we'd like to invite some of the gitea developers to have some space there too
19:50:43 <clarkb> so I'll be requesting space for them as well
19:50:56 <fungi> i don't have a visa yet, and travel late in hurricane season is always a big question mark for me, but i will be there if at all possible
19:51:11 <clarkb> I have until early august to fill out the survey but figure I'll probably send it in sometime this week
19:51:27 <clarkb> so if you think you might be going just let me know so I can adjust the headcount numbers appropriately
19:52:29 <clarkb> #topic Open Discussion
19:52:45 <clarkb> We have a few minutes for any other business that you want to bring up
19:53:36 <Shrews> china exceeds my travel tolerance by, like, a whole lot  :)
19:54:05 <fungi> mine too, but i've learned to repress my feelings
19:54:30 <ianw> speaking of docs
19:54:33 <ianw> #link https://review.opendev.org/669602
19:54:42 <ianw> #link https://review.opendev.org/668833
19:54:57 <ianw> that's letsencrypt docs i wrote, and also updates to mirror-update docs to reflect the new server
19:55:08 <clarkb> ianw: thanks!
19:55:32 <ianw> note we're exporting the logs of the rsync runs now -> http://mirror.ord.rax.opendev.org/logs/rsync-mirrors/
19:55:50 <ianw> that should be helpful to people who want to contribute mirrors but previously had no way to see what was happening
19:55:52 <fungi> excellent
19:56:12 <ianw> and if you want to see them in your browser
19:56:16 <ianw> #link https://review.opendev.org/670934
19:56:18 <ianw> fixes that :)
19:56:30 <fungi> /
19:56:50 <ianw> oh, and for that to apply
19:56:51 <ianw> #link https://review.opendev.org/670927
19:56:55 <fungi> probably better to link to http://files.openstack.org/mirror/logs/rsync-mirrors/
19:57:16 <clarkb> Shrews: thankfully there is a direct flight from seattle to shanghai
19:57:39 <ianw> fungi: hrm yes, i guess that will need a mimetype update too
19:58:13 <fungi> that way you don't have to pick a random mirror to look at the central logs from the mirror updater
19:58:43 <fungi> alternatively, we can just point them at the really trivial docs on how to set up your machine as an openafs client
19:58:56 <fungi> and they can use whatever local tools they want to view those logs over afs
19:59:14 <ianw> many options :)
19:59:15 <clarkb> And we are just about at time. Thank you everyone!
19:59:20 <fungi> thanks clarkb!
19:59:36 <clarkb> Feel free to follow up on any and all topics here in #openstack-infra or at openstack-infra@lists.openstack.org
19:59:39 <clarkb> #endmeeting