19:03:39 #startmeeting infra 19:03:40 Meeting started Tue Mar 14 19:03:39 2017 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:43 The meeting name has been set to 'infra' 19:03:45 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:04:12 #topic Announcements 19:04:23 i don't have any this week 19:04:35 as always, feel free to hit me up with announcements you want included in future meetings 19:04:56 #topic Actions from last meeting 19:05:06 #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-03-07-19.01.htm 19:05:30 pabelanger to send ML post to get more feedback on our current run-tox playbooks / role 19:05:38 i want to say i read it cover to cover 19:06:02 fungi: then say it! 19:06:34 #link http://lists.openstack.org/pipermail/openstack-infra/2017-March/005230.html Feedback requested for tox job definition 19:06:39 looks done to me 19:06:40 o/ 19:07:06 pabelanger: per yesterday's zuul meeting, you're no longer blocked on that, right? 19:07:56 fungi: right, we are merged 19:08:23 #link http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-03-13-22.02.log.html#l-120 Status updates: Zuul sample jobs 19:08:25 cool 19:08:57 #topic Specs approval: PROPOSED Zuul v3: remove references to swift (jeblair) 19:09:26 #link https://review.openstack.org/443984 Zuul v3: remove references to swift 19:09:30 oh thanks 19:09:35 i was looking for that link under my sandwich 19:10:02 this is pretty short. we talked about it at the ptg, and a little in irc since then 19:10:31 the gist is that we think sending logs to swift is a thing people will still probably want to do, but we don't need built-in support in zuul for it 19:10:37 any feel for whether there are downstreams relying on that feature? 19:10:48 so we can drop that from the spec, and later add it back to what we're calling the 'standard library' 19:10:51 ahh, right, can just be in a role/playbook 19:11:07 seems entirely non-contentious to me 19:11:29 anyone object to opening council voting on this until 19:00 utc thursday? 19:11:59 i think based on informal conversations, it's ready for vote 19:12:27 #info Council voting is open on "Zuul v3: remove references to swift" until 19:00 UTC Thursday, March 16 19:12:41 looks like it also depends on some silly whitespace fixes 19:12:53 they are absolutely critical. 19:13:11 which are now approved 19:13:57 #topic Specs approval: PROPOSED Zuul v3: update job trees to graphs (jeblair) 19:14:19 #link https://review.openstack.org/443985 Zuul v3: update job trees to graphs 19:14:38 this also has a follup clarification which i don't think needs its own vote, but it may be helpful to look at it since it makes the example more clear 19:14:58 #link https://review.openstack.org/445022 Zuulv3: clarify job dependencies example 19:15:16 this one is substantial -- it changes the job and project-pipeline definition syntax 19:15:37 it also has an implementation based on a patch which was proposed to the master branch some time ago 19:15:40 so you can see it in action 19:15:59 while a small part of me feels like this is scope creep, it was apparently stated that this would be a part of zuul v3 and as it's basically already written, i don't see any reason to object 19:16:33 yeah, we probably should have put at least a placeholder in the spec earlier to say "we're going to do this but we don't know what the syntax will look like yet". sorry about that. 19:17:00 before v3.0 is definitely the time to do this though, since it's a major configuration syntax change. we won't have any users left if we put them through two of those. 19:17:24 right, that's the best argument in favor for me. much harder to do after the big bang syntax change 19:17:46 anyone object to opening council voting on this until 19:00 utc thursday? 19:18:05 (the main thing to note in the spec update is that we lose the ability to structure jobs in a yaml tree. so you don't *visually* see the dependencies in the config file. but you are able to express them fairly easily, and of course it allows more topologies than before.) 19:18:55 no objection here. I think i have already voted positively on this one 19:19:11 #info Council voting is open on "Zuul v3: update job trees to graphs" until 19:00 UTC Thursday, March 16 19:19:27 seemed pretty consensual in the zuul subteam meeting yesterday anyway 19:19:54 #topic Priority Efforts 19:20:15 i don't see any specific updates/blockers for these called out in the agenda 19:20:55 i know that the task tracker and zuul v3 work are proceeding, and zaro was inquiting about some of the testing prerequisites for the gerrit upgrade just before the meeting 19:21:17 at some point we likely need to revisit the ansible puppet apply spec and see whether whatever's remaining on it is still a priority 19:22:50 #topic Zuulv3 sample jobs (pabelanger) 19:22:58 is this a holdover from last week? 19:23:08 looks like the thing which spawned the action item 19:23:08 it is, sorry for not deleting it from the wiki 19:23:25 no problem, sorry for not cleaning up the agenda )or paying close enough attention to last week's meeting minutes) 19:23:41 #topic Plan sprint for getting off precise. Should happen within the next month or so. (clarkb) 19:23:44 ohai 19:23:50 this looks fresh, or at least worth rediscussing 19:24:05 one of the things we siad pre ptg was that we should have a sprint to finish up the precise cleanup (and possibly work on xenialification) 19:24:16 precise EOL is a month away or so so we should work on that 19:24:46 yes, we talked a little about this last week in open discussions 19:24:54 thanks for bringing it up again 19:25:11 ah ok, I missed last week (I guess i hsould've checked logs) 19:25:20 was a time frame selected? I think we likely need at least several days 19:25:41 no, we didn't select a date 19:26:06 I'm basically wide open between now and summit (no planned travel) 19:26:27 i won't be able to pitch in a week from tomorrow (wednesday the 22nd) as i have a prior obligation which will occupy most of my day 19:26:56 aside from that i don't have any more travel until april 10th 19:26:56 I should be good anytime between now and summit 19:27:06 no travel on the books 19:27:13 maybe we can target next week and just deal with fungi being out :) 19:27:21 I think sooner we start working on this the better 19:27:22 almost a month with no travel for me--quite thrilled 19:27:32 nice 19:27:54 do we have a list handy? 19:27:55 well, i'm only out one day. i'll happily work on it the rest of the week if that's the plan 19:28:34 i'm around (in au time) and happy to help on this too 19:28:41 I don't have a list handy, but shouldn't be had to get one from ansible/puppet? 19:29:03 might be helpful to estimate how many days 19:29:10 good point 19:29:12 * fungi remembers the days when he could just pull that up in puppetboard 19:29:22 fungi: me too :( 19:29:29 speaking of servers still running on precise... 19:29:37 (for some definitions of "running in this case) 19:30:08 https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans found our old etherpad 19:30:36 perfect. if anybody's deployed anything at all on precise since then, they're clearly not paying attention 19:30:57 wiki can come off that list 19:31:06 lists, static, planet, puppetd, wiki, zuul. And pretty sure soe of those are done 19:31:36 zuul is now trusty, just checked 19:31:39 static is done too 19:32:18 so that brings us to lists, planet and puppetdb (puppetboard) 19:32:22 yup 19:32:28 and I checked those all are precise 19:32:33 is puppetdb still useful? 19:32:40 it's so very, very broken 19:32:42 planet wants an upgrade to xenial iirc 19:32:47 yeah 19:32:48 jeblair: I think if the service could be made to work yes 19:32:49 ci-backup according to site.pp 19:32:55 jeblair: not to me 19:33:11 pabelanger: oh, i think there is an upgraded one for that which hasn't been switched into production yet? 19:33:23 jeblair: the use case we need from something like it is reporting to humans that are not roots so they cansee when their changes happen 19:33:32 fungi: not sure, site.pp just lists precise 19:33:41 clarkb: but only if they look quickly, before the next 10 runs 19:34:03 jeblair: yes. Not sure puppetboard is the best way to solve that problem. But its the problem we'd like to have solved 19:34:07 well, in this case i say it's "not useful" because it's completely and thoroughly offline for a year or more now 19:34:33 i vote we drop it. i'd love the use case to be handled, but we should stop pretending it is. :| 19:34:37 perpetual "internal server error" 19:34:59 yeah, i won't object to someone redeploying it and getting it working, we can keep the config management around for it 19:35:17 but it's not like deleting the current precise host it's un-running on will be a regression 19:35:47 right 19:35:51 so mostly lists and planet then 19:36:28 yup. lists is tricksy because we'll need to defer inbound messages on a secondary mx (ideally) while we test out the replacement with copied (migrated?) data 19:36:39 interwebs not great for me currently 19:37:11 planet should be a relative non-event since there's no persistent data. just need to get it working (on xenial i guess?) and then switch dns 19:37:52 fungi: jeblair (email noob here) couldn't we set up a second MX record for ne whost with higher priority which would cause senders to fall back on existing host until ne whost existed? 19:38:02 is problem in syncing the data in mailmain? 19:38:06 fungi: well, we don't really need a secondary mx if the downtime is small (say < 30m). 19:38:17 i'm not sure the term "sprint" is really applicable here, but we do need one or (preferably) more people carefully planning the lists switch and then pre-testing the replacement 19:38:42 sending servers don't usually generate bounces for connection errors until they persist a while 19:39:36 right, holding the inbound queue on a secondary mx would really only be necessary if we can't make the switch (including dns propagation) happen in a reasonably short timeframe 19:39:54 but if we did want no interruption in mx service, we could probably just have the existing one queue and then deliver to the new host. 19:40:17 downside for that is it's an extra bit of mta configuration that needs its own testing 19:40:21 true, and that's only a few lines of exim config 19:40:36 but right, we'd probably want to test it somewhere else beforehand 19:40:59 I think we can take a few minutes downtime for queuing 19:41:03 the worst part of this is that we will lose our ip and its reputation 19:41:07 so maybe we should just take volunteers to work on the planet replacement and to work on the lists replacement and maintenance plan? 19:41:09 and worst worst case some small number of people might need to resend their emails? 19:42:05 jeblair: we could try an upgrade in place... 19:42:18 yeah. we can practice with snapshots. 19:42:54 it worked reasonably well for the wiki server 19:43:23 and while that would preserve our ip address, it's fairly out of character for us to upgrade that way 19:43:33 having had ubuntu lts' installs upgraded in place over years it ends up being pretty messy 19:43:37 but worth considering nonetheless 19:43:37 but functional 19:44:29 jeblair: how important is that reputation? do you expect we will end up being rejected by a bunch of MXes if we change IPs? 19:45:04 worth noting, i strongly suspect a redeploy from scratch will end up renotifying all the listadmins in the lists class manifest about their lists being created (along with the default listadmin password) 19:45:10 probably depends on who had that IP before :) 19:45:15 clarkb: it would not surprise me if it took a few days or a week to settle out. 19:45:17 yeah, quite the gamble 19:45:49 rackspace's vetting usually means we're not getting spammer ips, but still. 19:46:15 #link https://www.senderbase.org/lookup/?search_string=lists.openstack.org 19:46:23 i can take bringing up a xenial planet if you like; i maintained a planet in a previous life 19:46:23 so probably worht testing the inplace upgrade with a snapshot then? 19:46:41 email rep. is "good" for both ipv4 and ipv6 addresses 19:47:15 ianw: awesome, pleia2 indicated it should be viable on xenial, we just weren't quite ready to run xenial servers yet at that time 19:47:29 yeah, i'm thinking in-place may be worth trying for this. 19:47:36 no objection here 19:47:55 worth a try 19:47:57 #action ianw try booting a Xenial-based replacement for planet.openstack.org 19:48:10 it's pretty much the same mailman version right? 19:48:28 #agreed We'll attempt an in-place upgrade of lists.openstack.org from Precise to Trusty, practicing on instance snapshots beforehand 19:48:56 ianw: yes precise -> trusty mailman is baically the same 19:48:57 #link http://packages.ubuntu.com/mailman 19:49:09 2.1.14 -> 2.1.16 19:49:22 yep, that's good, not also shoehorning a v3 upgrade on top :) 19:49:43 last news i read, mmv3 is still in a questionable state 19:50:10 yeah, that was what i thought too, with all the configuration being quite different 19:50:12 is that something people will be able to work on soon? 19:50:25 any volunteers to work on mailman? I can help there but am definitely not an email expert 19:50:32 if we're already incurring the pain on precise->trusty, do we want to follow that with a trusty->xenial in short order? anybody remember which release got the dmarc workaround patches? 19:51:14 clarkb: i will volunteer to help 19:51:23 fungi: I think we may want to do a single in place upgrade, evaluate how that went. Then decide from there if we want to d oa second to xenial 19:51:25 #link https://wiki.list.org/DEV/DMARC says 2.1.26 added frim_is_list 19:51:26 better yet, i *do* volunteer to help 19:51:44 #undo 19:51:45 Removing item from minutes: #link https://wiki.list.org/DEV/DMARC 19:51:50 #link https://wiki.list.org/DEV/DMARC says 2.1.16 added frim_is_list 19:52:02 fungi: oh 2.1.16 is what precise has so maybe thats less urgent? 19:52:15 so trusty should get us dmarc-related options without needing to consider xenial quite yet 19:52:19 yep 19:52:42 2.1.18 added some more stuff though per that wiki 19:52:47 (I think we should also work to xenialify in the near future too, but don't want to make the goalposts to far ahead since precise is eol real soon now) 19:52:58 xenial would get us 2.1.20 19:53:04 to be honest, i'm not sure we should enable that option 19:53:28 i too am fine with sticking to our guns on ignoring dmarc 19:53:49 some listservs i've got subscriptions on have decided to unsubscribe and block subscribers from affected domains instead 19:53:59 well, block posts from 19:54:11 that is appropriate and consistent with what those domains have expressed as their policy via dmarc. 19:54:47 i'm intrigued by the 2.1.18 dmarc behaviors. i'll have to think about that a bit. 19:55:29 yea, half the blame is on dmarc-asserting domains for not telling their users they shouldn't post to mailing lists, and the other half on dmarc-validating mtas on ignoring the incompleteness which is dmarc 19:56:37 i'm going to spend the rest of the day trying to shoehorn "systemdmarc" into a sentence somehow. 19:56:50 so anyway, i can volunteer to help on the lists.o.o stuff too (i feel fairly confident with its cil utilities, pickle file manipulating tool and archive filesystem layout), though i worry i don't have time to drive the effort 19:57:16 jeblair: i'm sure lennart already has a plan for that one 19:57:16 do we wnat to pick a specific time next week to work on this further or let volunteers poke at it as they are able? 19:58:04 I guess jeblair and ianw can grab fungi and me for help as needed and go from there? 19:58:17 or is there any other infra-root who wants to gain a deeper understanding for mailman? 19:58:39 there's nothing like the stress of an upgrade to hammer home key concepts ;) 19:58:43 * Shrews hides in the darkest corner 19:58:49 fungi, clarkb: maybe let's poke at it as able, but plan on spending thurs-fri next week making real headway if we don't before then? 19:58:59 jeblair: ++ 19:59:02 jeblair: i'm happy to commit to that 19:59:09 are we defaulting to rax as a provider for these hosts? 19:59:15 well clearly the inplace upgrade stays 19:59:20 ianw: for lists yes due to IP 19:59:24 (hopefully that gives us some time to asynchronously make some snapshots, etc). maybe keep an etherpad with a work log. 19:59:24 that's a good question we don't have time to answer in the next 30 seconds 19:59:27 planet could potentially be hosted by vexxhost? 19:59:37 i'm fine with that idea 19:59:52 no objections to planet vexxhost 20:00:04 obviously the plan for lists.o.o is rackspace for now since we're doing in-place 20:00:10 and we're out of time 20:00:13 thanks everyone! 20:00:17 #endmeeting