19:01:10 #startmeeting infra 19:01:10 Meeting started Tue Apr 16 19:01:10 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:13 The meeting name has been set to 'infra' 19:01:21 #link http://lists.openstack.org/pipermail/openstack-infra/2019-April/006311.html 19:01:34 #topic Announcements 19:01:45 o/ 19:02:07 Week after next is the summit and ptg. I expect we'll be skipping the meeting on April 30 as a result 19:02:35 If those not attending want to have the meeting I have no objection. I just won't be able to put an agenda together or run it that day 19:02:52 #topic Actions from last meeting 19:03:01 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-04-09-19.01.txt minutes from last meeting 19:03:14 There were no recorded actions that meeting and I believe all the actiosn from the prior had been done 19:03:28 #topic Specs approval 19:03:56 I think we've all largely been focused on implementing the opendev git hosting spec, config mgmt update spaces and the LE implementation so nothing to add under this topic 19:04:02 #topic Priority Efforts 19:04:09 Lets dive right into the fun stuff 19:04:14 #topic OpenDev 19:04:19 whee! 19:04:40 I wrote down some specific questions about our planned migration on friday in the agenda 19:04:55 Firstup is we should probably set some specific times 19:05:17 I think my preference would be to get started as early in the day as people find reasonable so that we have the whole day to clean up after ourselves :) 19:05:28 first thing pacific time? what's that -- 1500? 19:05:36 ya 1500UTC is 8am pacific 19:06:29 as noted earlier, it's my 5th wedding anniversary so i'll have all my stuff done and up for review ahead of time but will only be around off and on for friday so will chip in when i can 19:06:53 though i plan to be around most of the weekend to help unbreak whatever we missed 19:07:30 mordred: 1500 work for you? 19:07:43 maybe we say something like starting at 1500UTC the infra/opendev team will begin work to perform the repo hosting migration. Expect outages off and on with gerrit and git hosting services through the day, particularly earlier in the day. 19:08:00 wfm 19:08:05 clarkb: sounds good 19:08:34 ok I'll work to send some followup emails with those details 19:08:50 Next on my list is "are we ready? what do we need to get done before friday?" 19:09:00 i'm about to push up the script which *generates* the mapping of renames (i know that wasn't a task in the story, but it became necessary to correlate the dozen or so data sources used to make that decision) and will then send a link to a breakdown of that mapping to the relevant mailing lists 19:09:17 corvus: ++ 19:09:20 and i'm about halfway through composing the script to do the repository edits themselves 19:09:41 ok so waiting on the changes that gerneate those lists. I think we should also freeze new project creation at this point 19:09:45 which i'm planning to also benchmark with a full set of data from review.o.o 19:09:49 clarkb: ++ 19:09:51 fungi: do you need anyone to jump in and pick up tasks, or are you okay? 19:09:52 it will just simplify fungi's list generation to not have to accomodate new projects 19:10:07 well, the list generation is dynamic 19:10:22 fungi: I know but people will want to review them 19:10:29 but it relies on some data sources we may want to urge projects to quiesce, like openstack's projects.yaml in governance 19:10:42 i'm fine with freezing project creation for sure 19:11:15 just thinking that, for example, since folks create new projects before they get added to governance there's a bit of a race where we may rename something we shouldn't 19:11:39 ++ 19:11:41 so maybe if jroll could encourage the tc to quickly flush any pending changes of that nature 19:12:21 Another thing that I think would be helpful is if we write down our plan in an etherpad like we've done in the past 19:12:42 there are fair number of moving parts here like dns updates and renaming things in gerrit and in gitea etc that having that written down will be good 19:12:49 guh, sorry, missed this meeting (and have another meeting shortly) 19:12:51 fungi: will do 19:13:42 my head is currently unhappy so I'm not sure I'd get to writing down that plan today, but I can probably start on that tomorrow 19:13:54 if anyone else is interested in writing the steps down too feel free to start :) 19:13:55 the mappnig generator is just down to me working out how to consume an ethercalc export on the fly now 19:14:48 We have a repo rename request on the meeting agenda wiki page as well. Is that somethign we want to incorporate into this transition or do in a followup? 19:15:55 it can be incorporated (i guess by adding to the spreadsheet would be one way) or we can save it to exercise the rename tooling after the mass migration 19:16:19 either works for me; i'll yield to fungi's opinion on that 19:16:37 yielding to fungi wfm. 19:17:16 the way i'm incorporating the renames ethercalc, it's as a final override, so adding it there would take care of incorporating it 19:18:08 hrm, actually looking back at my script, no it's not unless it's not already part of another project 19:18:29 Other things to consider: airship is apparently trying to do a release on monday. But they never brought this up in the emails we sent or in these meetings so I'm not seeing that as a blocker 19:18:37 so maybe we hold it for the week after? 19:18:42 fungi: in that case asving it for later seems fine 19:18:56 yeah, i'm leaning toward that so i don't revisit my current logic 19:19:57 Anything else we should be thinking about while we've got everyone here? 19:20:29 * mordred feels like he shoul dhave more thoughts of things to think about, but cant' think of any such thoughts 19:20:35 are we set on tls? 19:21:04 corvus: for gerrit I guess? fungi did you get a cert for gerrit when you got one for gitea? 19:21:16 yep, it's already in the certs dir 19:21:21 er gitea was already done so the recent one must've been for gerrit. cool 19:21:40 pulled that one just in case we wanted it 19:22:02 ianw: there was talk of using LE -- is that ready, or should we stick with manual certs now? 19:22:03 didn't want this to become a letsencrypt scramble on top of opendev migration 19:22:52 corvus: yeah, best to leave that, it can be done asynchronously under less pressure. it's the puppet-ish deployment bits that will need thinking on 19:22:57 k 19:23:25 https://review.openstack.org/653108 needs approval :) 19:24:00 and we'll probably need a change to our apache config on review.openstack.org to redirect review.openstack to review.opendev ? 19:24:07 and i guess we should prepare a change to do redirects from review.openstack to review.opendev 19:24:16 clarkb: jinx 19:24:32 oh and update the canonical name of the server 19:25:04 that's all i can think of 19:25:51 seems like a plan is coming together. This is exciting 19:26:25 #topic Update Config Management 19:27:02 Continuing to make good progress on the puppet-4 transition. Ran into issues where there are no puppet-4 arm64 packages so we'll be excluding the arm servers from puppet-4 19:27:17 i think we're almost done 19:27:37 nodepool servers are next up 19:27:46 ianw has started work for removing support for puppet3 from stuff as well 19:27:55 this allows us to update puppet moduels to versions that require at least puppet 4 19:28:20 yeah, if we could look at the grafana review, i can babysit that one 19:28:51 #link https://review.openstack.org/#/c/652443/ 19:29:02 that's ontop of the skip puppet3 fix too 19:29:27 on the docker side of things we've run into a few issues. Docker is still ignoring my PR for the ipv6 stuff (and the bug I filed). Skopeo has the same issue ( I think because it uses docker's parsing rules for docker:// urls). Also our insecure-ci-registry filled up with images so we'll have to set up garbage collection or different disk backing to address that 19:29:56 i think dropping puppet 3 will be easier to do in one big sweep after everything is migrated 19:30:01 clarkb: i think skopeo may literally use the docker code for that 19:30:15 corvus: yup I think so. I have a PR up to fix that code so hopefully eventually that gets addressed 19:30:52 i'm concerned the docker project may be effectively dead; every problem we've encountered has been described by a 2-year old open issue 19:31:09 er, every *other* problem :) 19:31:18 yes I've been wondering about that too especially given the lack of resposne on my PR and bug 19:31:57 so we may want to investigate alternative tools for building, pushing, and hosting images 19:32:15 yeah; though if skopeo has the same problem.. yeesh 19:32:44 perhaps we should open a skopeo issue? 19:32:47 corvus: that said - we know people on the skopeo side and they're actiev and responsive 19:32:49 yeah 19:33:01 like - it seems like a place where we should be able to get the issue resolved 19:33:05 at least if they are faced with "docker is sitting on this bug" they may have more options 19:33:09 yuo 19:33:12 yup 19:33:40 clarkb: would you mind opening that issue, then poke me so i can poke mordred to poke other people? :) 19:33:52 sure. I can point them at my docker side issue too 19:33:56 ya 19:34:25 anything else around config management updates? 19:36:01 #topic Storyboard 19:36:20 fungi: diablo_rojo: I've been so heads down on other things recently. Anything new here 19:36:35 same 19:36:44 oh, getting close with the telemetry migration at least 19:37:04 performed a test import of all their deliverables from lp onto storyboard-dev last week 19:37:37 though right now the ptl lacks control of the lp projects to be able to close down bug reporting there, so is trying to coordinate access with former project members (the groups are not owned by openstack administrators) 19:38:48 of course they're not 19:39:31 fun 19:39:49 alright lets move on we have 20 minutes left for the remainder of our agenda 19:39:54 #topic General Topics 19:40:24 ianw: did you want to give us a quick update on the letsencrypt progress? graphite01.opendev.org is letsencrypted now. What are the next steps etc 19:41:29 restarting services when certificates change is probably the last bit of global infrastructure 19:41:54 i have a proposal up using handlers, and clarkb has a different approach too 19:41:58 #link https://review.openstack.org/652801 19:42:13 I'm not tied to my approach and handlers are probably more ansibley 19:42:23 #link https://review.openstack.org/650384 19:42:29 i think we can sort that out in review 19:42:44 after that, it's really just service by service, deciding on how to deploy the keys into production 19:42:58 great progress. thank you for taking that on 19:43:14 I'm actually really happy with the setup we ended up with. I think it gives us good flexibility and isn't terribly complicated 19:43:19 ++ 19:43:46 next up is the trusty upgrade backlog 19:43:48 #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:43:54 lists.openstack.org was upgraded last friday 19:44:06 The only minor hiccup was a stale lock file for the openstack vhost 19:44:11 once we cleared that out everything was happy 19:44:26 Need volunteers for static, ask, refstack, and wiki 19:44:34 er just static, ask and refstack now I think 19:44:42 thank you mordred and fungi for volunteering to do status and wiki 19:44:57 but also I expect we'll be idle on this task this week as the opendev transition is the big focus 19:45:09 so that can be puppet-4'd now too right? 19:45:11 yah 19:45:11 could use a second review on the next bit for the wiki 19:45:13 cmurphy: yes 19:45:21 #link https://review.openstack.org/651352 Replace transitional package names for Xenial 19:45:25 clarkb: also - do we really need status as a separate host from static at this point? 19:45:33 mordred: that I don't know 19:45:44 I'll loop back around on that with folks in channel 19:45:50 mordred: status runs some services, at least some of elasticrecheck i think? 19:46:13 oh yeah. good point 19:46:30 The last item on the agenda comes from zbr 19:46:32 elasticrecheck and health are the only two things on status that continue to be useful things - the rest of the things are vestigal 19:46:45 Removal of ugo+rw chmod from default job definitions 19:46:51 zbr: ^ are you here to talk about this? 19:47:03 "Mainly this insanely unsafe chmod does prevent ansible from loading ansible.cfg files from our repos. That is happening from "Change zuul-cloner permissions" task which is run even for basic jobs like openstack-tox-py* ones." is the quote from the agenda 19:48:00 i'm all for it; is there a proposed solution? 19:48:25 iirc pabelanger and mordred intended to address this by not runnign zuul-cloner compat by default 19:48:34 so jobs would have to opt into the chmod sillyness 19:48:35 the role says "Make repositories writable so that people can hardlink" 19:48:39 I think that distills down to "remove zuul-cloner from base job" yeah? 19:48:50 ianw: ya d-g in particular will hardlink those iirc 19:48:52 mordred: ya 19:49:01 yah 19:49:09 at least if it's on the same filesystem 19:49:15 right, so the action needed is for someone to figure out which jobs still need that, make sure they have it, then remove from base 19:49:48 is it still in base, or only in legacy-base? 19:49:55 for some reason i thought the latter 19:49:55 I believe it is still in base 19:50:08 because we realized we'd break a bunch of things a year or so ago when we first looked at moving it 19:50:12 alternatively, just make it be that ^ and let chips fall where they may 19:50:14 ahh 19:51:18 yeah - I'm kind of inclined to just do the swap to make sure it's in legacy-base 19:51:25 if a job breaks from missing it - it's an easy fix 19:51:35 wfm 19:51:37 +1 19:51:58 maybe a post summit/ptg activity considering all the other ways we are going to break the jobs on friday :P 19:52:06 but ya that seems like a reasonable path forward 19:52:35 zbr: ^ hopefully that clears things up for you 19:52:45 clarkb: what is the role in question here? 19:52:55 fetch-zuul-cloner does not show up in base - but does show up in legacy base 19:53:08 oh maybe we have moved it then? 19:53:22 mordred: zbr claims openstack-tox-py* is affected by this 19:53:25 weird 19:53:28 are those legacy based jobs? 19:54:06 no, it is still in base. I seen it the zuul release job a few hours ago 19:54:35 pabelanger: really? it's not showing up in git-grep for me - so maybe I'm just bad at computers 19:54:41 https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/pre.yaml#L38 19:55:01 ah. I didn't grep opendev/base-jobs. whoops 19:57:02 ok we are almost out of time. I think we have a general plan forward on this 19:57:06 #topic Open Discussion 19:57:11 anything else before our hour is up? 19:57:35 I am looking to set up some github integration with Zuul for starlingx, eyeing the kata setup, a) is that the right model, b) is that best left for after Friday? 19:58:18 dtroyer: we can discuss it after the meeting in the infra channel likely 19:58:34 in general though I think we've found that we can't reasonably support github hosted projects 19:58:44 (we learned a lot with kata) 19:58:54 ah, ok 20:00:07 there was an official statement we sent to the infra ml a few months ago? 20:00:08 and we are at time. Thanks everyone! 20:00:20 fungi: yup Isent a note to clarify our position on it to the infra list 20:00:22 #endmeeting