19:01:19 #startmeeting infra 19:01:19 Meeting started Tue Jul 30 19:01:19 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:22 The meeting name has been set to 'infra' 19:01:24 o/ 19:01:35 Good morning ianw 19:01:45 #link http://lists.openstack.org/pipermail/openstack-infra/2019-July/006431.html 19:01:52 #topic Announcements 19:02:14 I think everyone is having too much fun debugging web browser behavior. That is ok I don't have any announcements 19:02:49 heh, "fun" 19:03:58 #topic Actions from last meeting 19:04:13 mordred: any progress on the github actions? 19:04:26 fwiw hogepodge was asking about zuul mirror on github because someone was asking him 19:04:26 oh it's meeting time 19:04:28 no 19:04:36 so we may want to get that sorted soon to make it clearer to people 19:04:46 yes 19:05:04 #action mordred create opendevadmin github account 19:05:08 sorry - I keep bouncing off the github api part of it - I'll try harder to not bounce off 19:05:10 #action mordred clean up openstack-infra github org 19:05:24 mordred: is the scope of the problem small enough that we can click buttons? we had a lot of repos so probably not? 19:05:46 probably not - given how many repos we have 19:06:01 ya I guess that would be a lot of clicking 19:06:13 #topic Priority Efforts 19:06:17 what do we need to do there? 19:06:22 #undo 19:06:23 Removing item from minutes: #topic Priority Efforts 19:06:40 we're what, force-pushing "moved" messages to the repos? 19:06:48 corvus: archive all of the repos under https://github.com/openstack-infra 19:06:53 oh - you know ... yeah - I think I might have been fixating too much on the archive thing 19:07:01 we're archiving them? 19:07:05 just force-pushing "we've moved" is a bit easier 19:07:05 well I think archive does what we want? 19:07:22 it gives a banner that says "this repo is RO and archived and doesn't live here anymore" type of message 19:07:31 does it say where it does live? 19:07:32 how about I start with the force-push - and we can come back to the archive later? 19:07:41 i'd be fine with just deleting them, but maybe i'm not thinking of the users 19:07:54 corvus: I'm not sure if the message is configurable by the user 19:08:05 force-push (or really even just a regular push) is super easy and I can have that done pretty quickly 19:08:06 fungi: anything without a new location is no better than a delete from my pov 19:08:10 mordred: but ya pushing something that says the content is over there: works too then if we archive it will just make that ab it more explicit 19:08:11 the archive message does not appear to be configurable when i looked 19:08:18 yeah 19:08:22 so - I'll work on force-push 19:08:26 so it would need to be readme/description/something 19:08:35 then later, when we feel like it, we can archive or not archive as we feel like 19:08:44 ok 19:08:45 fungi: yeah - I'm thikning a normal retire-this-repo type commit 19:08:53 leaving only a README with a "this is now elsewhere" 19:09:02 pointing to the elsewhere 19:09:04 okay, then the 2 options i like are: 1) push readme + archive; 2) delete. the one thing i don't want to do is archive without readme. 19:09:22 I think push readme is the absolute easiest 19:09:37 (because archive without readme looks even more like the project terminated than just deleting it) 19:10:18 sounds like 1) is the current plan. mordred any reason that you don't think 1) will work? 19:10:54 cool, let's action that explicitly so we don't forget which way we decided to "clean up" the org :) 19:10:58 nope. I am very confident in 1 - or at least the 1st part - and I think the second part is a get-to-it-later thing - since that's a new feature that hasn't even existed that long 19:11:21 so I don't think we have to say "we must use the new gh feature that lets you mark a repo readonly" - since those are readonly anyway 19:11:36 #action mordred Push commits to repos under github.com/openstack-infra to update READMEs explaining where projects live now. Then followup with github repo archival when that can be scripted. 19:11:48 \o/ 19:11:50 ++ 19:11:50 thanks mordred 19:11:58 #topic Priority Efforts 19:12:00 yeah, the deeper i looked into the "archive" option the less convinced i became that it's super useful 19:12:04 (for us) 19:12:06 ++ 19:12:12 #topic OpenDev 19:12:35 fungi has done a bunch of work over the last week or so to replace our gitea backends. Thank your fungi! 19:12:58 These gitea backends all have 80GB of disk now instead of 40GB. That should give us plenty of room for growth as repos are adding and otherwise have commits pushed to them. 19:13:20 This process has also added 8GB swapfiles to address the OOMing we saw and the images we used have properly sized ext4 journals 19:13:26 that should make teh servers happier 19:13:57 and improve performance as well as stability/integrity 19:14:05 Some things we have learned in the process: the /var/gitea/data/git/.gitconfig.lock files can go stale when the servers have unhappy disks. When that happens all replications fail to that host 19:14:37 If we end up doing bulk replications in the future we want to double check that file isn't stale (check timestamp and dmesg -T for disk errors) before trigger replication 19:14:54 unfortunately from gerrit's perspective replication succeeds and it moves on, but the data doesn't actually update on the gitea server 19:15:14 fungi: ^ anything else you think we should call out from that process? 19:15:21 we should also try to figure out where the sshd logs are going and fix that 19:15:26 ++ 19:15:37 there is a good chance they go to syslog like the haproxy logs 19:15:39 cause i think that would have helped debugging (we learned about the problem via strace) 19:15:47 and if we mount /dev/log into the container we'll get the logs on the host syslog 19:15:59 yeah, nothing other than we also added health checks in haproxy 19:16:00 or maybe we can add an option to send them to stdout/stderr? 19:16:06 and got it restarting on config updates 19:16:07 because we run sshd in the foreground 19:16:28 though i don't know if that is compatible with its child processes 19:16:34 -e Write debug logs to standard error instead of the system log. 19:16:39 that might work 19:16:46 yeah, or i think sshd_config can be set for it 19:16:51 either one 19:17:14 changes to the docker file should all be tested 19:18:05 #action infra Update gitea sshd container to collect logs (either via stderr/stdout or syslog) 19:19:01 #topic Update Configuration Management 19:19:17 mordred: has been doing a bunch of work to build docker images for gerrit 19:19:33 I think we should now (or very soon once jobs run) have docker images for gerrit 2.13, 2.15, 2.16, and 3.0 19:19:53 mordred: is the next step in that process to redeploy review-dev using the 2.13 image? 19:20:30 clarkb: yes. 19:20:48 well - the next step will be writing a little config management to do that 19:20:51 but yes 19:20:55 exciting 19:21:53 ianw: on the ansible'd mirror side of thigns I had to reboot the new mirror in fortnebula that fungi built recently to get the afs mounts in place (it was failing the ansible afs sanity checks prior to that) 19:22:18 ianw: have all the fixes related to that merged? I thought I had approved the one fix you had called out. But maybe we need to add a modprobe to the ansible? 19:22:35 yes, afaik 19:23:07 one thing it could be is if dkms only builds the module for the latest kernel on the host but we haven't rebooted into that kernel yet? 19:23:08 yeah, afs clients will need a reboot partway through configuration (or maybe just a modprobe) 19:23:29 i believe dkms will do it for all installed kernels by defauly 19:23:31 in CI testing we do build and then test straight away 19:23:31 default 19:24:06 hrm behavior difference between cloud images and nodepool images maybe? 19:24:13 something to look at closer when we build more opendev mirrors I guess 19:24:14 maybe when ansible installed the openafs packages it didn't wait long enough for the dkms builds to complete? 19:24:23 maybe longer because more kernels? 19:24:33 fungi: it should wait for apt-get to return 19:24:39 maybe ... the dpkg doesn't return till it's done 19:24:55 hard to say at this point, as you say something to watch with a new server 19:25:15 i didn't notice this doing the other rax openafs based opendev.org servers, iirc 19:25:48 Any other configuration management updates/bugs to call out? 19:27:04 Sounds like now 19:27:09 #topic Storyboard 19:27:22 fungi: diablo_rojo: anything to call out for storyboard? 19:27:39 i don't think we have anything new and exciting this week 19:27:55 Nothing new, except to beg again for sql help 19:28:25 mordred, would you have some time in the next week or so to look at the querylogs and suggest some changes? 19:28:42 Better yet, make some changes.. 19:29:13 diablo_rojo: yes - I will look at them in the next week or so 19:29:31 mordred, thank you thank you thank you 19:29:37 i can try to follow the earlier example of generating a fresh analysis of the slow query log and publishing that, at least 19:29:52 but my sql-fu doesn't run very deep 19:30:43 so beyond a git grep for some of the combinations of field names to see where those queries could be coming from (or guessing based on what they look like they're trying to do) i don't know that i'll be able to run many of them down 19:31:16 does sqlalchemy have a way to annotate queries with that info? 19:31:34 (I'm guessing now because sql) 19:31:39 *no 19:31:59 like comments would just get thrown out before the slow query log ever sees them 19:33:23 Sounds like that may be it? lets move on 19:33:26 #topic General Topics 19:33:40 Trusty server update progress 19:33:42 #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:33:53 clarkb got wiki-dev puppeting! 19:33:55 I think puppet is running successfully on wiki-dev02 now 19:34:13 yeah, it's probably time to delete it and boot another one now 19:34:14 fungi: what is the next step there? testing that the wiki works? 19:34:31 ah ok so start from fresh, make sure that works, then maybe copy data from prod wiki ? 19:34:33 just to make sure things still get puppeted smoothly from the get-go 19:34:38 ++ 19:34:44 and then exercise it and try adding a copy of the db 19:35:07 sounds good 19:35:08 i can try to poke at that some now that the gitea replacements are behind us 19:35:31 Separately corvus has been adding features to make browsing logs in zuul's web ui possible 19:35:45 Those changes merged yesterday. And today there has been much debugging 19:36:02 I think this is the last major step in getting us to a point where logs.o.o can be a swift container 19:36:07 corvus: ^ anything to add to that? 19:36:32 at some point, this will become the default report url for jobs 19:36:50 so instead of linking directly to the log server when we report a job back, we'll link to zuul's build page 19:37:18 that will be great 19:37:20 It should give people more of what they expect if they've used a tool like travisci previously 19:37:23 ideally, i think we'd want to make that switch before we switch to swift (from a UX pov), but technically they are orthogonal 19:37:54 ya I think this work makes the swift change far less painful for our users because the zuul experience should remain the same 19:38:16 (because i think the transition of osla -> zuul-js is better than osla -> swift static -> zuul-js) 19:38:56 so yeah, i think debugging this, then flipping that switch, then swift are the next major steps 19:38:57 also the lack of autoindex if we point them directly at logs in swift, right? 19:39:10 this way zuul acts as the file index 19:39:13 fungi: we actually do get autoindex in swift if you toggle the feature 19:39:18 oh, cool 19:39:21 but the way zuul does it should be much nicer 19:39:24 fungi: we have static index generation to compensate for that, but i'd rather not use it 19:39:25 we just previously used swifts which didn't 19:39:47 oh, also good point, i forgot we worked out uploading prebuilt file indices 19:39:48 well, we actually put some static generation into the swift roles so we didn't have to rely on that 19:40:33 but yeah, neither that, nor the static severity stuff is better than osla, but i think that zuul-js is (or, at the least, no worse). so if we can do it in the order i propose, that's better 19:40:43 ++ 19:40:47 i concur 19:41:08 Next up on the agenda is a quick cloud status update 19:41:12 one remaining blocker to doing them in that order is to add https for logs.openstack.org, yeah? 19:41:37 fungi: ya or we serve the zuul links with http:// 19:41:44 fungi: ah, yes. or remove https from zuul. :| 19:42:07 I think we can do https://logs.opendev.org easily 19:42:11 then set the CORS headers appropriately 19:42:21 we should double check all the swifts we plan to use 19:42:41 oh, someone said that most of those should be under the swift api anyway, right? so should be https? 19:42:50 corvus: yes I would expect them to be https 19:43:00 we should double check but I'm not super worried about it 19:43:18 that would only leave rax, and technically that needs a bit more work anyway to support the cdn stuff 19:43:36 (er, i mean that would leave rax as something to investigate) 19:44:52 On the cloud test resources side of things donnyd has rebuilt the fortnebula control plane and we've run into some networking trouble with the mirror after that. I'll look at that after the meeting 19:45:17 Once that is sorted out I think we are hopefully near a longer term setup with that cloud whcih we can use going forward 19:45:32 mordred: any news on MOC enabling service tokens? 19:45:38 (or whatever that term is I'm searching for) 19:47:08 We must've lost mordred 19:47:15 I'll try to followup on that after the meeting too 19:47:36 Last up is PTG planning. Still quite a bit early but if you have any idea you can put them up at: 19:47:38 #link https://etherpad.openstack.org/p/OpenDev-Shanghai-PTG-2019 19:49:04 corvus: related to ^ I'm going to start sorting out the gitea stuff too 19:49:53 clarkb: thanks 19:50:04 And that was the agenda 19:50:10 #topic Open Discussion 19:51:07 did folks see the ml post about rget? 19:51:17 i know there were some replies 19:51:27 sounds like we have consensus to proceed 19:51:43 did not hear anything say it was a bad idea 19:51:50 yeah, to reiterate here, it sounds like a worthwhile thing to participate in 19:51:54 #link http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008107.html 19:52:40 i'm a little unclear about the githubiness of that... 19:52:53 from what i can see, it doesn't look like it should be an issue 19:53:11 but the authors seem to think maybe there's a little more work? https://github.com/merklecounty/rget/issues/1 19:53:33 I think they've done some magic to handle github urls specially 19:53:38 but my read of it was that this wasn't required 19:53:46 we can feed it a tarballs.o.o url and validate that 19:54:06 at any rate, it seems like they don't want to require github, so any issues we run into we should be able to resolve 19:54:11 oh that issue contradicts my read of it 19:54:17 but ya seems they'll be likely to fix that for us 19:54:31 yeah, i think i'm going to go stick a static file on a private server and see what breaks :) 19:55:05 the way the certs end up into the certificate transparency log shouldn't prevent any domain from working 19:55:13 i have personal projects where i already publish sha256sums i can easily test with as well 19:55:15 (maybe the client isn't the big issue? maybe it's the something the server does?) 19:55:19 they do mangle the path component and the hostname though 19:55:28 and with github it isn't a 1:1 I guess that is what they have to sort out 19:58:03 I'll let everyone got get breakfast/lunch/dinner now. 19:58:06 Thank you all! 19:58:09 #endmeeting