19:01:04 #startmeeting infra 19:01:05 Meeting started Tue Jul 16 19:01:04 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:08 The meeting name has been set to 'infra' 19:01:40 #link http://lists.openstack.org/pipermail/openstack-infra/2019-July/006417.html 19:01:46 Find our agenda at ^ 19:01:53 #topic Announcements 19:02:04 o/ 19:02:14 Nothing major. As mentioned previously I'm largely afk today visiting people in town for oscon 19:02:33 o/ 19:02:49 #topic Actions from last meeting 19:02:55 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-07-09-19.01.txt minutes from last meeting 19:03:03 #action mordred create opendevadmin github account 19:03:08 #action mordred clean up openstack-infra github org 19:03:21 mordred has apparently started on this second item and is writing a currently non working script to do the work 19:04:09 #topic Priority Efforts 19:04:16 hopefully he's writing a working script, and its non-workingness is merely a transient state 19:04:24 fungi: that was how I understood it :) 19:04:29 #topic Update Config Management 19:05:14 has anyone noticed yet the shiny job running optomization for system-config when you update it's .zuul.yaml? 19:05:57 Other than ^ I'm not sure we've made much progresson this item in the last week. Anyone have items to share related to this? 19:06:12 while i'm loathe to admit it, i haven't had occasion to update system-config's .zuul.yaml yet 19:06:19 i think the gitea repo creation is ready to merge 19:06:20 (since the changes took effect) 19:06:52 corvus: do we need to squash any of the changes together because the chagnes only fully work near the end of the stack? 19:07:04 that will unblock the next steps in the zuul-and-related-systems playbook 19:07:13 (or an alternative would be to disable ansible for a bit, get everything merged, then turn ansible on 19:07:35 clarkb: i think we can merge them as-is, the worst that can happen if the broken version runs is that it immediately errors out 19:07:54 gotcha so not failing and doing the wrong thing but failing safe. perfect 19:08:17 i'll start that going right now 19:08:20 The changes that I have reviewed so far look great (I need to rereview the parallelization change) 19:08:25 (since they have sufficient +2s) 19:09:13 as an added bonus I find the python script much more readable than the ansible yaml for this type of stuff (particularly in the manipulation of text/json) 19:09:24 :( i liked the yaml 19:09:47 clarkb: also, wait till you see the parallelized version before you say it's readable ;) 19:09:48 I always find it hard to reason about the loop constructs with item and subelements... 19:10:17 but yeah, we certainly wouldn't be doing *that* in ansible 19:10:22 in any case it is much faster: about half an hour serialized instead of ~4 hours 19:10:35 and i'm aiming for ~10m with the updates 19:10:37 and down to ~9 minutes if done in parallel? great improvement 19:10:44 i similarly find it hard to reason about threads and mutexes 19:10:51 so ymmv ;) 19:11:11 0:13:43 19:11:19 is how long it just took 19:11:31 so that node is a little slower than my workstation :) 19:11:49 That is probably a good transition into opendev topics 19:11:58 #topic OpenDev 19:11:58 quite the speedup! 19:11:58 still, all things considered, i think it probably means we can keep using the real data in the test jobs 19:12:12 corvus: nice 19:12:24 corvus: and at that speed can probably run it twice (to check the nooping too 19:12:37 yeah, that should be very fast 19:12:54 This related to opendev because this is sort of the very first step in improving project creation work that we discussed at the PTG 19:13:31 We'd like to get to a place for orgs can largely self manage the proejcts under their umbrellas and making sure that doesn't take hours to reconcile is the very first step in that 19:13:35 so yay 19:14:17 Next we can have zuul trigger updates, then we can start distributing responsibility for managing those updates into the orgs ya? 19:14:51 yep 19:15:21 also we can thoroughly test changes to those mechanisms in a timely manner 19:15:59 (in addition to speeding up gitea server replacements) 19:16:19 so much win 19:16:27 fungi: those are actually done via database recovery currently to preserve the redirects 19:16:32 fungi: so that doesn't take very long 19:16:42 ahh, right, good point 19:16:43 (we have docs on that) 19:16:50 though we have plans for doing it from scratch 19:16:52 though we do have the redirects recorded in yaml 19:16:58 so we may get to that point in the future 19:17:07 yup eventually we should eb able to make it more automated via direct restoration 19:17:09 should we need to rebuild them 19:17:31 The other opendev/gitea item I wanted to bring up was the one from today with the OOMing 19:18:04 Our gitea servers don't have swap and under some circumstances (this needs further investigation) they OOM which can kill git which can prevent replication of refs 19:18:39 I've got https://review.opendev.org/#/c/671102/ pushed up and if yall can take a quick look at that and decide it mostly does the right thing I can run that manually on gitea06. The reason for doing it on 06 is 06 has much more disk than the other gitea servers which unfortunately don't have much extra to spare 19:18:39 i am mildly concerned that one client can basically cause an arbitrary gitea backend to cancel random git processes 19:18:50 but that ought to help 19:19:02 and ya we'll need to investigate further to see what is causing that and hopefully work to fix it in gitea 19:19:29 Anything else opendev related or should we move on? 19:19:57 cacti says outbound bandwidth spiked to at least 150mbps when this happened 19:20:09 where the baseline was around 10mbps 19:20:27 so maybe client rate limits could help, if haproxy has those 19:20:31 journald should have logs of what requests we made then right? 19:20:36 corvus: ^ does gitea log that information? 19:20:38 probably 19:20:51 well, apache will, presumable? 19:20:56 er, presumably 19:21:02 i think gitea has it 19:21:04 we don't run apache with gitea 19:21:13 i don't think we have an apache (though we talked about it; we can add one if needed) 19:21:21 but right now, our setup is simple enough we can do without 19:21:27 oh, got it, so it's haproxy straight to the gitea sockets 19:21:37 yep, and gitea is terminating the tls 19:21:46 and haproxy won't know the nature of the requests because of not terminating ssl/tls 19:22:26 fungi: it should know l3 information which may be sufficient for rate limiting (though that may make NAT users sad) 19:22:59 right, just won't tell us what they were requesting that might be so voluminous (maybe they were just recloning every repo though) 19:23:31 pretty sure gitea is logging the reqs 19:23:38 cool so we can look into it further than 19:23:41 s/than/then 19:23:57 and docker-compose can spit those out? 19:24:03 The other thing I've done to help temporarily is triggered replication on against all gitea backends (currently on 06) 19:24:14 fungi: ya or the docker command 19:24:20 (and possibly journalctl too) 19:24:21 cool, thanks 19:24:47 The extra replications are ensuring that all refs are in place which may have been OOMKilled 19:25:46 seems like a reasonable precaution until we get this under control 19:26:14 Sounds like we have rough ideas of how to address this further so lets move on 19:26:17 storyboard time 19:26:20 #topic Storyboard 19:26:31 diablo_rojo_phon is at OSCON so unsure if paying attention today 19:26:38 fungi: any new news 19:27:14 we're going to try to do some story feature request subclassification on friday 19:27:27 but other than that, no major news i'm aware of 19:28:33 #topic General Topics 19:28:42 first up is trusty server upgrades 19:28:49 fungi: any luck with the wiki git repo situation? 19:28:59 I meant to help dig into that then got sucked into new cloud stuff 19:29:08 link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:29:46 I think the new cloud stuff is largely under control/done so I can help with this tomorrow if still a problem 19:29:48 i haven't had a chance to dig into what might be causing it. weird though... basically files like /srv/mediawiki/w/extensions/Renameuser/.git which just contain "gitdir: ../../.git/modules/extensions/Renameuser" 19:30:06 is that what submodules look like? 19:30:25 seems like the reverse of a submodule 19:30:36 or maybe not 19:30:38 maybe it is subtree? which is like submodules in practice but implemeted differently 19:30:56 but ya my hunch would be somethign around submodules 19:31:04 i was just thinking git needed another way to be used wrong 19:31:21 oh! we are cloning these into another git worktree, so... maybe! 19:31:30 corvus: I think they set out to make submodules work better :) and well ya 19:31:54 i think the repo we're cloning to /srv/mediawiki/w/ may include files like extensions/Renameuser/.git 19:32:13 i'll pursue that avenue 19:32:32 cool let me know if I can help tomorrow 19:32:39 Next up is New cloud updates 19:32:47 Wanted to mention that fortnebula is now in full use 19:32:53 thank you donnyd for getting that set up 19:33:28 In the process we fixed some issues with our cloud launcher config, problems in glean/dib with centos/fedora network manager compatibility and probably other things I'm forgetting now 19:33:45 Hopefully that makes it easier to set up the MOC and linaro resources that we expect will be coming up soon 19:34:01 For linaro Kevin Zhang has reached out to me and says they are starting to build an oepnstack arm cloud on packethost 19:34:18 and will get in touch when that is working and they ahve sorted out IP addressing (I gave them our IP addr requirements) 19:34:29 corvus: mordred isn't here, do you know what is going on with MOC? 19:34:56 or knikolla? 19:35:07 o/ 19:36:10 just curious if there was anything more to share today since it was mentioned last week 19:36:13 knikolla: mordred mentioned you were looking at providing moc resources to opendev's nodepool. have any details/updates? 19:36:15 if not I'll wait patiently :) 19:36:41 clarkb: I still have 40% more to go, just waiting on more gear 19:36:56 knikolla: i think mordred made an account and maybe next thing is to add the second project to that? 19:36:57 I plan to have 100 builders 19:36:59 i remember seeing a request for an account on our cloud from mordred, i assume someone on our side approved that 19:37:07 donnyd: awesome 19:37:08 ahh, cool 19:37:13 thanks knikolla! 19:37:54 if more than one project is needed, mordred can apply for another in the same way 19:38:14 ya typically we use two so that we can separate untrusted test nodes from trusted services like the mirror 19:38:23 good to know the process is the same just repeated 19:38:44 cool, sounds like progress and we should check with mordred 19:38:48 cool, does nodepool support application credentials? 19:39:29 knikolla: it supports authentication via anything configurable through clouds.yaml 19:39:31 so maybe? 19:39:55 then most probably yes. 19:39:56 clouds.yaml supports app creds 19:40:35 ok sounds like next step is sync with mordred, thanks 19:40:43 Next on the agenda is managing our PPA 19:41:04 I've never personally had to manage a PPA so not sure I should lead this discussion but it was mentioned last week that it would be good for us to formalize the process a bit more 19:41:13 maybe that means docs or full on automation or some balance in between 19:41:28 ianw: ^ you've done much of the work there so probably have thoughts 19:41:34 yesterday I wrote up some docs at 19:41:37 #link https://review.opendev.org/#/c/670952/ 19:42:06 if people want to make roles/jobs to build and sign debs and put them into the ppas with dput or whatever, that seems just fine 19:42:40 Thank you for the docs that seems like a good place to start 19:42:56 corvus maybe you can review that change and think about whether you'd like more (like automation) 19:42:57 i would say, for things like the afs work ... you're taking an upstream tarball and then shoe-horning it into the existing (older) debian package. it's a lot of manual fiddling, and you've got to have a pretty good idea of how debian packaging works 19:43:04 if ianw is away, and someone says "we need a new openafs package" i'd like to know what to do... 19:43:24 hrm ya in that case automating the shoehorning might be a good idea 19:43:42 maybe thats jsut a script we can run periodically? 19:44:23 i mean, i don't need a general guide, it's more, specifically, what commands do i run to make an updated openafs package? 19:44:56 (that could be a job, i'm ambivalent about that) 19:46:01 we actually have precedent for this: https://docs.openstack.org/infra/system-config/nodepool.html#vhd-util 19:46:34 gotcha so its that bit of documentation. FWIW I find the other bit ianw wrote useful too (particularly the stuff around perms) 19:46:46 ianw: so maybe we can add something like the vhd package build docs too and start from there? 19:47:13 the equivalent of that for openafs i think would be sufficient 19:47:31 if the openafs source package includes a debian/watch file it may be as simple as running uupdate in the source tree 19:48:31 that seems like a reasonable path forward. We only have 12 minutes left so I'll continue as the last item is fairly important too 19:48:47 fungi: it can be, but then we've also done quite a lot of backporting at times for things like arm64 support. the complexity definitely varies based on what is being done. 19:48:57 (also, is that vhd-util stuff completely obsolete now?) 19:49:07 corvus: no we still use it to build rax images 19:49:08 given rra used to handle the openafs uploads for debian, the package is probably fairly easy to update like that 19:49:32 and yeah, patching is always going to be the gotcha 19:49:52 though you can probably drop diffs into the debian/patches/ directory 19:50:11 We have been asked if we would like to attend the PTG as a team. I know I'll be going and expect fungi to be there as well and think mordred and corvus are going too. A rough headcount will be helpful for planning purposes but I think even with a small group we should have a chunk of time to work on opendev type things (this was useful in denver) 19:50:32 ++ 19:50:33 Also I know that we'd like to invite some of the gitea developers to have some space there too 19:50:43 so I'll be requesting space for them as well 19:50:56 i don't have a visa yet, and travel late in hurricane season is always a big question mark for me, but i will be there if at all possible 19:51:11 I have until early august to fill out the survey but figure I'll probably send it in sometime this week 19:51:27 so if you think you might be going just let me know so I can adjust the headcount numbers appropriately 19:52:29 #topic Open Discussion 19:52:45 We have a few minutes for any other business that you want to bring up 19:53:36 china exceeds my travel tolerance by, like, a whole lot :) 19:54:05 mine too, but i've learned to repress my feelings 19:54:30 speaking of docs 19:54:33 #link https://review.opendev.org/669602 19:54:42 #link https://review.opendev.org/668833 19:54:57 that's letsencrypt docs i wrote, and also updates to mirror-update docs to reflect the new server 19:55:08 ianw: thanks! 19:55:32 note we're exporting the logs of the rsync runs now -> http://mirror.ord.rax.opendev.org/logs/rsync-mirrors/ 19:55:50 that should be helpful to people who want to contribute mirrors but previously had no way to see what was happening 19:55:52 excellent 19:56:12 and if you want to see them in your browser 19:56:16 #link https://review.opendev.org/670934 19:56:18 fixes that :) 19:56:30 / 19:56:50 oh, and for that to apply 19:56:51 #link https://review.opendev.org/670927 19:56:55 probably better to link to http://files.openstack.org/mirror/logs/rsync-mirrors/ 19:57:16 Shrews: thankfully there is a direct flight from seattle to shanghai 19:57:39 fungi: hrm yes, i guess that will need a mimetype update too 19:58:13 that way you don't have to pick a random mirror to look at the central logs from the mirror updater 19:58:43 alternatively, we can just point them at the really trivial docs on how to set up your machine as an openafs client 19:58:56 and they can use whatever local tools they want to view those logs over afs 19:59:14 many options :) 19:59:15 And we are just about at time. Thank you everyone! 19:59:20 thanks clarkb! 19:59:36 Feel free to follow up on any and all topics here in #openstack-infra or at openstack-infra@lists.openstack.org 19:59:39 #endmeeting