19:01:08 #startmeeting infra 19:01:09 Meeting started Tue Feb 19 19:01:08 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:12 The meeting name has been set to 'infra' 19:01:19 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:23 o/ Hi! 19:01:32 o/ 19:02:09 I seem to remember scribbling a note to link to the mailing list threads when I send those out instead /me scribbles note bigger to do that 19:02:22 #topic Announcements 19:02:47 I have nothing to announce. Any announcements others would like to call out? 19:03:34 nothing springs to mind 19:04:07 I guess I should note the openstack TC election nomination period is now and ends shortly 19:04:22 so get your self nomination pushed if that is something you would like to do 19:04:24 ~4.75 hours remaining 19:04:55 Then be on the lookout for ballots in the near future 19:05:12 #topic Actions from last meeting 19:05:20 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-02-12-19.01.txt minutes from last meeting 19:05:48 We didn't call this out as an action in the meeting notes but ianw took the action of updating the letsencrypt spec to consider dns based acme validation. This has been done and is the subject of our next topic 19:06:07 #topic Specs approval 19:06:13 #link https://review.openstack.org/#/c/587283/ LetsEncrypt Spec now updated with DNS ACME 19:06:26 I don't think this is ready for approval based on the good conversations this has started 19:06:58 It does seem we are making good progress through the conversation though. So please take a look if this interests you at all 19:07:16 and thank you ianw for getting that put together with some poc ansible work to show how it might actually run in production 19:07:17 it seems to be zeroing in on a more concrete design 19:07:52 yes i've updated from comments yesterday, a small changelog is in the notes for ps14 19:08:22 i also updated the POC to provide self-signed certs in a testing environment 19:08:48 (one thought for future development is that we could run our own LE, containerised) 19:09:04 future use in CI I mean 19:09:49 ianw: as far as next steps go it seems like we are nearing where we need to start making some decisions and moving forward? do we think we might be able to put it up for approval next week? if we can get through another round of reviews and post a new ps with some decisions? 19:09:49 and responded to corvus' thoughts on breaking up base.yaml in https://review.openstack.org/637645 19:10:12 corvus: fungi ^ you've reviewed it recently too so your thoughts on how quickly we might be able to come to consensus is appreciated 19:10:56 clarkb: yes, i think so; i'll commit to responding to any review comments quickly so we can get things going 19:10:57 i think i'm generally in favor of what's written. 19:11:15 i would love one more update that said "we aren't going to run anything that touches proprietary rax dns" 19:11:53 i haven't looked at the diff from the most recent patchset yet 19:11:59 skimming now 19:12:06 as the purchasers of openstack.org certs for our services I'm ok with corvus' request and doing one last bulk purchase round 19:12:23 if that helps swing any opinions 19:12:25 but i'm at least happy enough that we're all on the same page that it's at least not something we'll use for any new services and will only shrink, that i can go along with what's written 19:13:22 alright anything else on this before we move on? (thanks again to everyone that helped push this forward, ianw in particular) 19:13:26 my feeling is that it really is quite a small change, and we can have certificates for anything under openstack.org which is something that we've talked about for a long time 19:13:47 having previously obtained the certs for openstack.org for a number of years too, i agree that generating them all at once for a renewal is less painful than adding new ones for services we add over time (and new services should be going into opendev.org anyway) 19:14:47 also, moving our existing http-only services to opendev.org before switching them to https shouldn't pose significant burden 19:15:40 seems like we should be able to sort that out in review. Please do review the spec. But I think we need to move to our next topic(s) 19:15:51 ++ 19:15:53 #topic Priority Efforts 19:15:59 #topic Storyboard 19:16:09 fungi you have good news for us on this topic iirc 19:16:29 Storyboard prod and dev servers are now both running on xenial with local mysql DBs right? 19:16:53 last remaining cleanups are to configure backups? 19:17:16 oh, well we replaced the trusty server with a xenial deployment in production (and upped the flavor help with performance), plus moved the database out of trove and onto the server itself 19:17:38 yay xenial! 19:18:01 #link https://review.openstack.org/637388 Clean up after StoryBoard replacement maintenance 19:18:05 \o/ 19:18:10 good for another 2 years iirc (EOL)? 19:18:10 that's the cleanup change 19:18:27 and yeah, i still need to init the remote backups 19:18:29 prometheanfire: yup 19:19:03 also clarkb spotted that karbor has asked to move to storyboard 19:19:10 #link https://review.openstack.org/636574 Update karbor to use storyboard 19:19:57 and cinder is adding a new repo to sb to test it out too 19:20:08 #link https://review.openstack.org/637613 Add cinderlib project 19:20:15 cinderlib a library to consume cinder drivers without an api service 19:20:48 yeah, i get the impression that'll be handy for some container orchestration system use cases 19:21:02 oVirt is also adopting it. 19:21:10 neat! 19:21:21 anything else to be aware of or help with in storyboard land? 19:22:02 If anyone has spare time to look at the attachments patches it'd be much appreciated :) 19:22:48 #link https://review.openstack.org/#/q/status:open+topic:story-attachments storyboard file attachment changes 19:23:41 nothing else on my side 19:23:48 Mine either 19:23:53 #topic Configuration Management Updates 19:24:21 Now that I've got pbx off my plate I intend to pick up the puppet 4 upgrades and futureparser changes again. 19:24:53 Also worht noting (beacuse some had missed it) that we are now running a service with ansible + docker and no puppet. This service is the insecure image registry that will be used for zuul jobs that build docker images 19:25:38 i had forgotten after reviewing many (most?) of those changes even 19:26:41 I think work here has slowed beacuse its starting to become our new normal. Though still more work to be done with puppet upgrades and other service flips to docker. 19:26:48 Are there any other items we need to call out on this topic? 19:27:26 i'm planning on having zuul-preview deployed that way too 19:27:39 i think it's easy and fast :) 19:27:52 ya once the underlying tooling is in place it seems to be pretty straightforard 19:28:05 and sounds like the plan for now is to also do gitea that way? 19:28:07 the big bootstrapping need was the image building wheel and we have that now 19:28:13 fungi: thats our next topic :) 19:28:26 * fungi returns to the present in that case 19:28:28 So maybe time to move on to that 19:28:42 #topic OpenDev 19:29:09 Last we met at this venue corvus expected we'd have the gitea cluster in a production like state ready for us to really start interacting with it 19:29:21 reality happened 19:29:39 Then we did a couple gitea upgrades and discovered that gitea had never intended for shared everything gitea clusters to "work" and they fixed it 19:29:55 long story short you kind of need to run a single gitea process today the way it is built 19:30:07 #link gitea bug for shared indexes https://github.com/go-gitea/gitea/issues/5798 19:30:18 to be fair it's the search engine that has this problem 19:30:32 but the search engine is a major selling point 19:30:45 so after a discussion in #opendev earlier today I think we are leaning towards running multiple shared nothing gitea instances similar to how we run multiple shared nothing cgit instances today 19:31:07 which does mean independently replicating to them for now 19:31:37 ++ 19:31:40 And longer term it sounds like they'll add support for elasticsearch backed search indexes for the code which will allow us to go back to the shared everything cluster model in k8s 19:32:20 also, i think we didn't touch on this, but we'll likely need some manner of client affinity in the load balancing layer to deal with replication being out of sync between cluster members 19:32:47 (if memory serves we do this in the cgit haproxy instance today?) 19:32:50 i'd like to keep the k8s cluster up and running to continue work in parallel on deployment and testing tooling 19:33:11 fungi: yeah, i expect us to keep exactly the haproxy config we have now with docker-compose gitea. 19:33:12 fungi: we switched to least connections backend with haproxy but can go back to client affinity based on ip 19:33:25 iirc 19:33:31 corvus: basically just switching the backend ips really, right? 19:33:36 mordred: yep 19:33:40 \o/ 19:34:01 clarkb: oh, did we not actually end up with any git fetch errors from dropping the client affinity? 19:34:11 in that case it's probably safe to do the same with gitea 19:34:11 though, maybe it's more complex because we're also changing hostnames... so... i dunno, maybe it's a duplicate haproxy. 19:34:33 and actually... 19:34:43 do we want to host the docker gitea instances in vexxhost? 19:34:51 (that's where we put the k8s cluster) 19:34:57 is there a reason not to? 19:35:22 only reason i can think of is ipv6, which should be fixed before it matters to us. 19:35:40 but if we do, then we definitely want a new lb, since that should be in the same dc as the backends :) 19:36:05 have we gotten working reverse dns for v6 addresses in vexxhost yet? 19:36:27 fungi: we don't have v6 in vexxhost sjc1 yet. but it should be there RSN. 19:36:32 ahh 19:36:43 corvus: what was the reason for using sjc1 over montreal? 19:36:56 well, originally, because magnum was newer there :) 19:37:06 there is working ipv6 in the other region today 19:37:40 we could ask mnaser where he'd like us to put them :) 19:37:43 ++ 19:37:56 either way, it'll be a new load balancer. 19:38:09 Other news in opendev land is we've started booting replacement servers with opendev dns names and CNAMEing openstack records to them 19:38:22 I updated the launch docs with some short notes on how this changes our server laucnh processes 19:38:37 please do propose updates to that little README if there are inconsistencies with reality 19:39:18 sweet 19:39:42 yeah, the storyboard servers are this way now 19:39:53 as is the new pbx 19:40:12 Anything else on opendev? 19:41:21 #topic General Topics 19:41:39 First up frickler points out that I have been bad with emacs and gpg agent and the secrets file 19:41:48 do we know why image sizes jumped? https://nb01.openstack.org/images/ 19:42:29 me too 19:42:40 prometheanfire: we can follow up after the agena'd topics 19:42:48 clarkb: I was also wondering whether we somehow automating this to be safer. like having a cronjob that checks for running gpg-agent 19:43:01 we should write a /usr/local/bin script 19:43:06 frickler: or maybe create a small script to use to run emacs for that file 19:43:06 editpw or something 19:43:07 clarkb: k 19:43:08 corvus: jinx 19:43:11 I have also been bad at emacs 19:43:20 clarkb, corvus: ++ 19:43:21 corvus: I think that solution will be easy for me to remember once i use it a couple times 19:43:27 frickler: ^ what do you think? 19:43:36 sounds good, yes 19:43:38 yeah. there's no way i'll remember "gpg-agent --daemon emacs" 19:43:59 was this an email? i've missed this 19:44:00 no one think about that too much. 19:44:08 re: gpg agent 19:44:28 can we configure gpg-agent to run on-demand and terminate immediately by default? 19:44:30 ianw: no frickler pointed it out in irc then added it to the meeting agenda 19:44:43 ianw: the tldr is running emacs to open the gpg encrypted file starts a gpg agent that never goes away 19:44:55 i want to say i've seen instructions for how to do that when building and signing debian packages in a chroot 19:45:15 ianw: and it caches the passphrase for an hour or so 19:45:19 fungi: the gpg-agent --daemon emacs command is the command version of that 19:45:28 fungi: so it is possible 19:45:40 oh right, yeah from time to time i've killed gpg-agents after emacs went bananas and totally messed up the terminal too 19:46:25 fungi: we were unable to find any way to cause the correct behavior when we were looking at this in berlin 19:46:45 I think the editpw script solves this in a pretty straightforward way 19:46:49 so I'm good with that option 19:46:56 fungi: other than the gpg-agent --daemon emacs ... but if there is a way to just configure the system to not be terrible, that would bae awesome 19:47:01 clarkb: ++ 19:47:41 frickler: is that something you would like to push up? 19:48:02 yeah, i'm hunting for a config option there 19:48:19 ok we've only got ~12 minutes so lets get to the next item 19:48:26 clarkb: I can give it a try, but if someone else wants to do it, I'm not too sad either 19:48:31 frickler: ok 19:48:37 Next up Trusty server upgrades 19:48:45 #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:48:52 fungi and I have made recent progress on this 19:49:05 there are still a few outstanding and potentially getting more difficult servers in that list. 19:49:36 yeah, frickler just now approved the inventory addition for the openstackid.org xenial replacement so we're probably close to swapping it out 19:49:40 If you can grab one or two it will go quickly. Another thought was maybe a sprint sometime after openstack release candidate processes quiet down? ( think that is mid march? 19:50:15 we probably need to plan another in-place ubuntu release upgrade for lists.o.o 19:50:19 I can send out an email to try and organize a sprint if we think we'd sit down for a few days to focus on this. Maybe let me know in the infra channel today if you think you could help with a sprint 19:51:17 prometheanfire: ok image size questions. The easiest way to start sorting that out would likely be to grab the qcow2's from before and after the size job, and either nbd mount them or boot them and run du? 19:51:23 prometheanfire: I do not know why there is a size jump 19:51:39 most of that image size cost is git repos iirc. So it is possible we've got a git repo growing quickly 19:51:45 i will go back and see why we turned of the automatic size reporting in the dib logs 19:51:56 i'd like to let folks know that i'm continuing work on using the intermediate docker registry. still hitting bugs, but hopefully soon we'll be able to have speculative docker images, which means we can test changes to images in our deployment tooling before deploying them. 19:52:05 it was intended for exactly this :) 19:52:14 corvus: that is exciting 19:52:18 corvus: also yay for ansible bugs/behavior 19:52:31 corvus: \o/ 19:52:31 #topic Open Discussion 19:52:35 clarkb: ya, will probably do that 19:52:37 we've started discussing open things anyway :) 19:52:42 ptg schedule? 19:53:29 prometheanfire: I brought it up last week and have an etherpad to start putting ideas on 19:53:32 * clarkb finds link 19:53:48 #link https://etherpad.openstack.org/2019-denver-ptg-infra-planning 19:54:09 I think we're moving at a relatively quick pace on a lot of items so it is hard to know exactly what we'll want to work on at the PTG 19:54:17 but there should be plenty of topics when we get there 19:54:32 k 19:55:04 i think we should do some containerization, but i don't think we'll know what until much closer. 19:57:08 One thing I've struggeld with (and maybe this means I need to try hard to use gertty again) is keeping track of outstanding reviews that need to be done for these various efforts. I think we have been reasonably good about posting stacks/topics that need eyeballs but maybe I should look at using storyboard worklists again 19:57:27 this is where i got sidetracked with graphite; it's probably better to just do the easy upgrade to xenial, then tackle containerisation separately? 19:58:01 ianw: I think that is where we've ended up on a couple other upgrades. 19:58:03 clarkb: i try to make sure the topic is set on opendev-gerrit changes 19:58:22 it's not perfect 19:58:29 * corvus changes some topics 19:58:32 corvus: ya maybe what I really need is a gerrit dashboard to collect a number of those topics for me 19:58:41 (I'm mostly just brainstorming out loud about this) 19:58:56 clarkb: gerrit-dash-creator might help too, certainly a shared one that highlights current important topics 19:59:11 well, we're supposed to have that in the specs page 19:59:13 http://specs.openstack.org/openstack-infra/infra-specs/ 19:59:31 corvus: ya, you can fine tune it a bit more with a dashboard though 19:59:32 that url seems broken 19:59:33 https://review.openstack.org/#/q/status:open+AND+(topic:storyboard-migration+OR+topic:opendev-gerrit) 19:59:46 like show all the changes with one +2 and no -1's to flush them quickly etc 19:59:53 sphinx's url highlighting may simply be broken 20:00:02 oh right, yeah we could do a little fancier one that filters out the different votes 20:00:10 looks like it doesn't include the trailing ) in the hyperlink and instead shows it as text 20:00:12 sure, but we don't need that for priority efforts 20:00:19 corvus: that is true 20:00:20 there are never more than a handful of those 20:00:25 and we are at time 20:00:30 thanks clarkb! 20:00:36 Thank you everyone. I'll send you back to your regularly scheduled day 20:00:39 so what i'd love is if we kept up making sure the topics are set right 20:00:40 #endmeeting