19:01:12 #startmeeting infra 19:01:13 Meeting started Tue Jan 29 19:01:12 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:17 The meeting name has been set to 'infra' 19:01:28 I'll give it a couple more minutes for people to trickle in before starting 19:01:35 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:04:15 o/ Hi! 19:04:34 hello :) so far just us. I may just run through the agenda to get it recorded and we can see where that takes us 19:04:40 #topic Announcements 19:04:56 I am going to be on a plane during next week's meeting (February 5th) 19:05:14 we'll need a volunteer to chair the meeting. fungi should be back too so may be willing to do that if others are not 19:05:59 Other than that fungi is afk this week and I get on my first leg of the trip early tomorrow morning so it may start to be quiet around here in the near future 19:06:06 then back to normal sometime later next week 19:06:25 oops what'd i miss? 19:06:38 hello 19:06:51 corvus: just me pointing out we'll need a volunteer meeting chair for next week 19:07:03 clarkb: enjoy safe travels 19:07:05 and that starting tomorrow it will likely get really quiet as fungi is on vacation and I'm headed to fosdem 19:07:13 returning to normal some time next week 19:07:14 i volunteer fungi 19:07:21 I second 19:07:32 mordred is also gone for the rest of this week 19:07:42 with so many absences, i will probably hide under my desk 19:07:56 maybe change my nick or something 19:07:57 nah, now's the time to party 19:08:16 i'm pretty sure i saw the letters 'pto' near ianw's name recently 19:08:24 ya 19:08:50 #topic Actions from last meeting 19:08:55 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-01-15-19.01.txt minutes from last meeting 19:09:15 I don't think it was officially recorded as a new action last week but I think we are still waiting on docs update for dns management from corvus? 19:09:18 #link dns docs update https://review.openstack.org/633569 19:09:24 oh woot 19:09:24 wait concluded! 19:09:41 I have added that to my queue as I should be able to review that before I put clothes in a bag 19:10:43 considering the general lack of quorum I'll skip over specs approvals this week (I think the only outstanding one is the anomaly detection in logs and I haven't had a chance to look at it recently) 19:10:49 #topic Priority Efforts 19:11:01 #topic Storyboard 19:11:16 fungi got a new xenial storyboard-dev server up and running against a local database 19:11:25 (at least I believe it switched the db) 19:11:37 keep an eye out for changes to flip over production to a similar setup 19:12:18 SotK has also started to stub out the attachment storage code if you are interested in what that will look like 19:12:36 #topic Update Config Management 19:12:42 o/ 19:13:10 corvus: cmurphy: last week I didn't get to pay much attention to changes on this front and this week I'm trying to catch up on opendev things. Anything we should be looking at? 19:13:47 clarkb: i have two patches still to pull the upgrade trigger https://review.openstack.org/629667 https://review.openstack.org/630391 19:14:24 cmurphy: ok maybe when I am in your timezone we can get through that first one 19:14:40 clarkb: okay 19:14:46 i'll be in brussels too 19:14:57 main thing on the container side is the new jobs/system for building images. that's described in a post to openstack-discuss, and should be all in place now in system-config. 19:14:57 corvus: care to review https://review.openstack.org/#/c/629667/ and cmurphy and I can try to get that rolling from brussels? 19:15:37 corvus: and that is building images for gerrit and zuul now? 19:15:51 so basically we have a nice way to build either first-party or third-party images 19:16:03 yes, gerrit being an example of a third-party image, zuul first-party 19:16:18 i'm continuing to sketch out running a service from that image with this: https://review.openstack.org/630406 19:17:00 #link https://review.openstack.org/#/c/630406/ WIP change to run service from our built docker images 19:18:11 i think that approach works (as the previous patchset illustrates), so i'm sure if anyone got around to doing that for a simpler service, they could land such a change sooner :) 19:18:23 nice 19:19:25 #info https://review.openstack.org/#/c/630406/ illustrates method that could be applied to simpler service to land docker based service changes sooner 19:19:42 #topic OpenDev 19:20:30 On the opendev front ttx has been working to get kata its own tenant in zuul. I've been helping with some of the foundation work to make that possible including base jobs repo content and a top level opendev zuul website 19:20:53 After the meeting I intend to go get a cert sorted out for zuul.opendev.org so that we can get that deployed 19:21:04 I expect that zuul itself will quickly follow into becoming its own tenant 19:21:32 zuul or kata? 19:21:42 anteaya: well both. Zuul shortly after kata I expect 19:21:49 okay thank 19:21:51 s 19:22:45 Additionally I've been asked to sort out https://etherpad.opendev.org hosting. I think the way we want to approach this is to use the existing setup but redirect etherpad.openstack.org to etherpad.opendev.org. Then all existing pad urls will work but we'll start pushing people to the new thing. The one gotcha here is we have to set up ssl for both openstack.org and opendev.org for that redirect to 19:22:46 work 19:22:58 if anyone is interested in sorting the config management for that please let me know, I can help with that as I am able 19:23:28 is that higher priority than opendev gerrit? 19:23:29 Based on the work above I figured after fosdem I would put together a list of TODOs for work like that so we can track it better and give people a sense of when things will be available under the "neutral" name 19:23:36 corvus: I don't think so 19:23:36 it feels like two other services have jumped the queue. 19:23:40 can you expand on the 'I've been asked' part? 19:23:53 mysterious direction makes me nervous 19:24:22 corvus: I've begun to see the gerrit and zuul stuff as more of a unit as they are coupled together quite a bit. 19:24:36 anteaya: the rust vmm folks looking to use etherpads 19:24:58 okay thanks, knowing where requests come from helps me 19:25:07 corvus: I do think that gerrit and zuul are the higher priority as they represent the significant road blocks. But I do also think we can likely work these in parallel if people are wanting to poke at etherpad 19:25:43 clarkb: i think zuul and gerrit are separable, which is why i wrote http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html about gerrit only. 19:26:00 i think it's a huge amount of work and will require attention from the whole team at once 19:26:07 corvus: thats fair. 19:26:18 i don't think we are at a place where we can suggest people go and pick off little things 19:27:08 or, at least, i don't want to be the one person working alone on the hardest problem 19:27:28 corvus: I mention it because I know the gerrit stuff can be overwhelming to a lot of people, but there may still be interest in helping somewhere. If we'd rather hold the line on getting the big things out of the way as a single collective first I'm good with that too 19:28:18 do we have a place for identiying prioritization requests from certain groups? 19:28:38 if someone brand new came out of the woodwork and showed up with an etherpad change, great. i'm just a little worried about splitting the attention of the folks who have the context to do the big stuff. 19:28:45 anteaya: no, I think that would potentially be a secondary function of me trying to write down all the things that need to happen then we can prioritize from there 19:29:25 okay thank you 19:29:29 corvus: this is probably a good time to talk about productionizing gitea then. Are you happy wit hthe current version of things running there? 19:29:47 i wanted to ask the rest of the team if they were :) 19:29:48 I guess the docker image building piece was a prereq to ^ but we've got that sorted out now 19:30:00 yeah, we could go into production with gitea now 19:30:15 mordred says there is a new version of rook which doesn't require the flexvolume driver or something 19:30:29 the upshot of which is that we might be able to reconsider running on magnum 19:30:55 my time with magnum for the nodepool k8s cluster has me leaning against wanting to do that 19:31:18 so that's a big thing i'd like to get feedback on -- would we rather run our own k8s with the ansible modules as we are doing in the POC, or try again with magnum 19:31:20 it runs on top of deprecated distro release and doesn't have enough disk space to upgrade k8s on top of itself 19:31:49 it is slightly more work to deploy our own cluster, but that way we get up to date distro hosts as well as better control over disk usage 19:32:33 okay, so if we lean toward those ansible modules -- it's like the first bit of ansible that we're running from an upstream source.... are we okay running it out of the github repo, or do we want to make our own fork? 19:33:05 corvus: maybe pin to a sha1 from the github source? similar to how we've pulled puppet modules from github 19:34:03 the other two questions I had were did we end up running an ha k8s cluster (if not do we want to test that first?) and how comfortable are we with ceph disk consumption in the current setup? 19:34:49 okay. so i think what we probably want to do is: re-build the poc with new rook, verify it all works, then update the ansible changes to pin a sha, then i think we can land them. i can send an email with that to the infra list since this meeting is somewhat sparsely attended. 19:35:02 clarkb: the k8s cluster is not ha, but i don't think it's necessary 19:35:44 corvus: that plan sounds good to me 19:36:03 corvus: then maybe aim for a switch ~february 11? 19:36:11 (similar to openstack, if the control plane is absent, you just can't change things, but they keep running. the individual services in the cluster will be ha) 19:36:26 s/will be/are, in our current config 19:37:08 and yeah, i think the ceph disk usage is tolerable. i don't have the numbers handy, but we have everything replicated and had plenty of headroom, with the poc being only half our expected prod size 19:37:55 usage: 78 GiB used, 240 GiB / 318 GiB avail 19:37:57 I guess the switch can be done mostly transparently, we update dns and bam 19:38:25 the gitweb links in gerrit config likely need to be updated, that is straight forward 19:38:58 corvus: that is without the refs/changes refs right? 19:39:14 which is likely to be the biggest user facing change in all of this. 19:39:22 i think we should view gitea being in production as a not-very-public step. i don't think we should switch anything else out yet, there's still quite a bit of planning to be done with redirects, etc. 19:39:53 so basically, brining the new system online in parallel with the old, then next we flesh out the changes to actually start switching things to it. 19:40:05 got it. That makes sense 19:40:17 that will probably involve things like deciding on dates, making announcements, etc. 19:40:30 and yes, that omits refs/changes 19:40:36 corvus: from our perspective I guess its moer about putting line in sand about not rebuilding from scratch for each thing, but rolling forward? 19:40:45 yep 19:41:01 sgtm 19:41:05 and i think we can go ahead and serve that from the real hostname, since that isn't otherwise in use 19:41:13 ++ 19:42:05 i think that's it for the gitea-related stuff i had 19:42:39 I think that will also make it a bit easier for other roots to poke at it since it will be a "stable" set of names and tooling we can refer to 19:43:06 corvus: were there any other gerrit related items worth talking about? 19:43:12 yes and will make the process of creating the redirects easier if we have something to point to 19:43:16 (I think the bulk of the work so far has been in gitea) 19:43:47 nope, that's it 19:43:54 #topic General topics 19:43:55 still a bunch of stuff to do on http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html#work-items :) 19:44:29 corvus: I can ping qa team about git:// urls 19:44:37 may help to be in europe for that too :) 19:44:44 Moving on 19:44:48 that would be great, thx 19:45:06 #link https://etherpad.openstack.org/2019-denver-ptg-infra-planning Infra/OpenDev PTG Planning document 19:45:26 I created that link and have just formalized it as part of the meeting agenda but it doesn't have content yet. 19:45:53 As we go through this opendev work if we find things that we want to be in person for or have to deprioritize for whatever reason maybe throw it on that etherpad as a potential work item 19:46:08 then as we get closer to the PTG we can reduce the list to what makes sense (if necessary) 19:46:18 Mostly wanted a place for ideas to go early, but no rush 19:46:37 Next is intermediate container image artifacts 19:46:39 #link http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-January/000718.html Docker registry or log server 19:47:03 that was a message to zuul-discuss, but there's kind of a question for openstack-infra/opendev buried in there... 19:47:23 i'd like to implement that plan (i've nearly completed the required change to zuul itself) 19:47:49 and i'd like to use it in our system. that either means writing some ansible roles to export/import docker images from the logserver 19:47:58 or running a docker registry 19:48:38 fwiw I've long pushed against the docker registry because while it seemed simple in practice no one got around to figuring out how it would look from a distributed standpoint. One way around that is to centralize things (which our log server would be doing anyway) 19:49:10 One thing we'll need to keep in mind is disk usage, though I expect that to be small if we aren't responsible for the base image layers too 19:49:13 it would be a central registry just used by jobs to stash intermediate artifacts, but it would be publicly accessible for reads. the only tags for images on the server would be weird, like "pipeline_check_change_1234", so not very useful to the general public. 19:49:38 i think with the registry option, we probably would end up with base layers on there 19:50:33 the docker image registry also officially wants to run out of a container so that may be a good simple first service to run that way 19:51:17 corvus: we could also proxy cache the central registry for reads to make use of the region local caches 19:52:07 (with the logserver, i think we can edit them out, but not so much with the registry) 19:52:47 corvus: and we are not planning to use this as the registry of record right? that will remain dockerhub or quay etc? 19:52:55 (its purely an implementation detail of CI jobs) 19:53:09 correct 19:53:15 final published images go to dockerhub 19:53:58 in that case having a registry fronted by the proxy caches seems like it could be nice because then the tooling mimics the "production" type image publishing 19:54:12 we won't have to shim in the export/import from fileserver bits 19:54:22 yep, that's why i lean toward this way 19:54:38 and yes, i think with the approach sketched out in the change above, actually running the registry will be simple now 19:55:01 i think we can consider all the data ephemeral too, so we don't need a lot of planning around robustness 19:55:41 the main goal here (and why i'm heading out on this tangent) is to get things to the point where we can start doing depends-on changes to image builds 19:55:45 I'd be happy to go down that path and if we find problems with it we have half implemented fallback in the form of the log server 19:56:14 several of the things we've been doing lately we've had to fall back on the pattern of actually landing changes before proceeding with the next step 19:56:22 everything is so slow that way :( 19:56:49 and just overcommunicate that it shouldn't be used as a source of production deployments 19:57:25 i'm sure *someone* is going to do it, but if we name the registry "insecure-temporary-registry.opendev.org" at least they'll be the ones who look bad, and not us. 19:57:32 :) 19:57:50 (also, i expect us to aggressively prune images from it) 19:57:58 ha ha ha 19:58:01 But ya keeping the tooling similar to real workloads is useful there so ++ to that plan 19:58:22 and being able to 'docker run' something that was built in check could be a powerful dev/debugging tool. 19:58:46 okay, i'll proceed with the run-a-registry plan 19:58:55 #topic Open Discussion 19:59:03 and now ~1 minute for anything else that slipped through 19:59:32 have a productive fosdem clarkb and cmurphy and anyone else attending 19:59:58 pour one out for the rest of us :) 20:00:03 or, better yet, just drink another one 20:00:05 #endmeeting