19:01:18 #startmeeting infra 19:01:19 Meeting started Tue Mar 20 19:01:18 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:22 The meeting name has been set to 'infra' 19:01:25 o/ 19:01:46 * jesusaur lurks 19:01:50 o/ 19:02:02 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:07 * fungi has the topic correction queued up for after the meeting, just remind me 19:02:13 #topic Announcements 19:02:29 infra-root should go and sign the rocky release gpg key 19:02:33 o/ 19:02:38 the directions for doing this are on the system-config docs 19:02:43 What is the link to see the key on the server again ? 19:02:47 * clarkb makes a note to do it himself 19:02:50 fungi gave that to me the other day but didn't bookmark it 19:02:55 I seem to remember signing it already 19:03:02 #link https://docs.openstack.org/infra/system-config/signing.html#attestation Attestation instructions 19:03:39 #link https://sks-keyservers.net/pks/lookup?op=vindex&search=0xc31292066be772022438222c184fd3e1edf21a78&fingerprint=on Rocky Cycle key 19:04:03 also find them at the bottom of the releases site when in production 19:04:10 I ended up finding http://sks.spodhuis.org/pks/lookup?op=vindex&longkeyid=on&search=0x184FD3E1EDF21A78 but I guess that's the same thing 19:04:16 #link https://releases.openstack.org/#cryptographic-signatures 19:04:50 but nope - I have not 19:04:53 for the record, it went into production late yesterday, but you can still attest to it any time you like 19:05:19 I'm going to be traveling at the tail end of this week and early next week. Which means it would probably best to have someone else chair next weeks meeting 19:05:31 If you'd like to do that let me know in the infra channel later or via email? 19:05:39 i can volunteer, but happy for someone else to take a turn 19:06:05 * fungi got his fill 19:06:11 I'll likely go quiet thursday or friday while I pack and prep for conferency things 19:07:28 And finally since this group includes people that straddle the ops and dev roles you may be interested in a proposal to on the ops mailing list to merge the ops midcycle with the ptg 19:08:02 oh, last minute but you can announce that zuul &co are no longer infra deliverables as of today 19:08:41 ya was going to bring that up later too. We've had a long standing zuulv3 priority effort meeting agenda item that maybe we need to reevaluate in this split out? We can talk about it in a short while 19:08:47 #link http://lists.openstack.org/pipermail/openstack-operators/2018-March/014994.html Ops Meetup, Co-Location options, and User Feedback 19:09:12 (start of the colocation thread) 19:09:33 #topic Specs approval 19:09:51 #link https://review.openstack.org/#/c/550550/ Improve IRC discoverability 19:09:56 that change could still use some reviews 19:10:19 I also wonder if the slack kills irc gateway plan affects our other irc related spec 19:10:36 It doesn't really affect us directly as we don't use slack but there have been a non zero number of people wanting to use our bots with slack 19:11:00 clarkb: "the slack kills irc gateway plan" ? 19:11:11 clarkb: it makes me less inclined to be accomodating, tbh 19:11:15 corvus: slack recently announced that mid may their irc gateway would be shutdown 19:11:48 corvus: basically forcing all bots to use their applications interface. Which is quite limited on the free tier 19:11:59 (I'm sure that that isn't related at all to killing the gateway) 19:12:03 clarkb: you mean http://specs.openstack.org/openstack-infra/infra-specs/specs/irc.html ? 19:12:11 clarkb: that doesn't actually mention slack 19:12:36 in fact, i'd say a key characteristic of that spec is that we've de-prioritized non-irc support for our bots 19:12:42 corvus: it doesn't but one of the other alternatives we had considered in the past was using that one lib that talks all the protocols. 19:12:47 corvus: yes 19:13:08 yeah, so i guess that's now *even less* important 19:13:09 #link https://review.openstack.org/319506 [abandoned] Add spec for hosted IRC client 19:13:15 is also potentially related 19:13:39 fwiw I'm happy to continue using irc. I think it would be difficult for us to supprot slack natively 19:13:52 but I've also suggested in the past that "hey all the bots work with slack right now because irc" 19:14:18 i'm increasingly in favor of helping people find ways to make irc fit their needs rather than support additional protocols which merely fracture the community further 19:14:30 ++ 19:14:34 yep, sign me up for that 19:14:56 I know for a fact that several projects are using a slack <-> irc bridge (with something like https://github.com/ekmartin/slack-irc ) 19:15:05 Kolla does it, ARA does as well -- not sure if there's others 19:15:19 ok sounds like we don't need to take a major shift here as we are happy pushing towards irc 19:15:35 I mean, as much as I dislike Slack.. there's people who dislike IRC just as much and that shouldn't prevent them from contributing and discussing things 19:15:41 sure it should 19:15:46 it's not free software 19:15:53 and we don't require folks to use free software to work with our systems 19:16:04 I never said it was a requirement to use SLack 19:16:07 see https://governance.openstack.org/tc/reference/irc.html 19:16:26 I said there's people who prefer Slack to IRC and if we can let them use Slack to talk to us, why not ? 19:16:39 that I think is an entirely separate conversation 19:16:46 Yeah it is, sorry for sidetracking 19:16:49 (I was mostly scopign this to the specs we have related to changes to our bots) 19:17:01 (and sounds like there is agreement that wedon't need to update any specs related to that) 19:17:22 we can talk about the other thing post meeting if we really want to 19:17:43 #topic Priority Efforts 19:17:43 translating: "if people prefer proprietary software we should go out of our way to support it" (i can't personally agree with that) 19:17:59 #topic Zuul v3 19:18:13 i had an obvious typo in a previous remark, but i'll leave it as an exercise for the reader so we can move on 19:18:38 corvus: I parsed it :) 19:18:53 so about dropping this item -- 19:19:04 as mentioned earlier Zuul and friends are now independent projects no longer tied to OpenStack (and consequently technically infra) 19:19:12 i think we're really close to the v3 release at this point 19:19:22 to your earlier foreshadowing, i think we should still have a zuul v3 priority effort until v3 is released (possibly longer). there's plenty of work to be done on the implementing for openstack infra side still 19:19:23 This has been a long standing agenda item and ya I was going to say maybe we just keep it going through the release? 19:19:59 and i think we've covered everything in the spec (with the exception of some CD stuff -- but arguably, you can accomplish that with secrets now anyway) 19:20:15 there's still a fair amount of job conversion long tail which _could_ justify the priority effort extending past 3.0.0 getting tagged 19:20:18 the last several meetings this has mostly been an operational topic anyway; less about development. 19:20:25 then any operational concerns would be normal agenda items. Going forward. I think the consequence of this is that we should probably reevaluate our priority efforts and see if anything needs to be prioritized. 19:21:10 keeping this on the list until release and then just adding operational items as they come up seems reasonable. but also, if we wanted to go ahead and drop it, that works too. 19:21:25 This is neat because it means we are almost done with a long standing item and zuul gets to be free and hopefully used by far more people. 19:22:04 I think it will come up - somehow. Either with new features or new ways of doing jobs. 19:22:28 yeah, it will be nice to cross it off the in-progress list. i suppose v3 release is a good enough milestone we can call it no longer transitional 19:22:37 Just as a heads-up: tox-siblings is current topic that mordred is digging into, this caused a bit more havoc than expected ;( 19:22:43 AJaeger: ya I think it will remain on topic to talk about how zuul affects infra and openstack. I think bringing this up is more about fungis point 19:22:57 so rather than standing priority agenda item we'll move to as needed agenda items 19:23:03 clarkb: works for me 19:23:05 AJaeger, clarkb: I think we've got tox-siblings fixes almost ready 19:23:28 mordred: Hope so - just wanted to share with team so that they are aware of it 19:23:37 incidentally, one of the tox-siblings bugs was also present in the tox_install.sh scripts - but nobody had noticed it wasn't doing the right thing 19:23:50 which is good I think that means infra has accomplished the majority of why this was a priority effort in the first place. Which is a major milestone 19:24:06 clarkb: ++ 19:24:08 and it is also a transitional step for zuul itself 19:24:53 we'll all have to take a drink of our favorite beverage once that tag is cut 19:24:55 * fungi toasts another major success 19:25:12 oh, i'm early i guess ;) 19:25:40 as for zuulv3 topics proper I think the biggest one is pointing out that if you are running a zuulv3 you want to update all your executors to run latest master 19:25:50 there were several security bug fixes last week. Infra has updated its executors 19:26:01 anything else we want to go over re Zuulv3? 19:26:02 and there will be at least one more security fix, hopefully this week 19:27:19 Look out for that on the zuul mailing list 19:27:35 at least one more security fix before 3.0.0 presumably 19:27:40 ya 19:27:52 i expect plenty more security fixes in the coming months/years because it's software 19:28:15 #topic Project Renames 19:28:17 * fungi should clearly not seek a side job in public relations 19:28:21 fungi: ha 19:28:51 fungi has written up a new project rename process. I suppose we should all review that (do we want to wait before merging it after we perform renames?) and mordred has provided specifics for his project rename 19:29:00 I think this means we are basically ready to give it a go 19:29:13 \o/ 19:29:31 it's not so much written up ans written out. i just deleted things ;) 19:29:46 I'm probably not going to be around to help thursday-wednesday if we want to go and do it real quickly but that shouldn't stop us I don't think 19:29:48 oh what's the link? 19:29:56 #link https://review.openstack.org/554261 Update Gerrit project renaming for Zuul v3 19:30:01 was just fishing it out 19:30:11 will review 19:30:16 thanks! 19:30:43 looking at the release schedule next week should be safe too 19:30:45 it's primarily an attempt at encoding what we discussed in the meeting last week and subsequently over the course of the week in #openstack-infra, so hopefully no surprises 19:30:52 and probably the week after that 19:31:05 though I'm somewhat hopefully we'll do it sooner than that just to get past it 19:31:23 i can assist fri, but would prefer not to drive 19:31:33 oh jeez. I have a lot of patches to prepare 19:31:59 i've been asked to also help with some storyboard imports friday, so am willing to help but would also rather not be in the hot seat 19:32:00 mordred: should we wait a bit then? 19:32:17 also I think mordred is also traveling late this week? 19:32:35 corvus: no - it's fine - I just need to make rename patches for all the repos listed in fungi's patch 19:32:36 next week is starting to sound better 19:32:50 ya maybe March 29 or 30 19:33:01 mordred: hopefully most of those don't need patches, it's more a list of places to check whether they need patches 19:33:04 dmsimard: have you done a rename yet? 19:33:11 I have not 19:33:15 * mordred is not travelling friday - but is travelling monday - 29 and 30 are both great 19:33:19 that would be the easter weekend, not sure who has holidays there 19:33:25 mordred: oh we'll miss you on the weekend then? 19:33:30 dmsimard: [sales pitch] it uses ansible! 19:33:31 * clarkb wonders if he needs to brush up on his zuul 19:33:34 clarkb: no - I'll be in LA on the weekend 19:33:45 clarkb: I'm just flying to LA first thing saturday morning 19:33:45 mordred: ah just not friday gotcha 19:33:59 frickler: arg 19:34:05 * mordred has symphony tickets for friday for the brandenburg concertos 19:34:25 mordred: if we wanted to do it friday would you be able to drive it? 19:34:31 * corvus now has bach stuck in his head 19:34:37 (to avoid easter weekend conflict) 19:34:50 clarkb: yes, I can drive it on friday if need be 19:34:58 i have the utmost respect for people of religion, but must admit i have a hard time remembering when religious holidays fall 19:35:07 fungi: easter is a hard one because it moves around 19:35:19 also - the followup for openstacksdk for the rename is doing the storyboard transition - so if fungi is already in storyboard transition brainspace, that could be handy 19:35:29 easter is easy ... 19:35:40 clarkb: there's school vacation starting a week earlier for my kids, helps remembering ;) 19:35:40 mordred: sure, happy to 19:35:44 it's the first sunday after the first full moon after the vernal equinox 19:35:49 mordred: though also, renames in sb are super easy 19:36:05 fungi: nod- I figured I'd just do it after the rename to keep churn low 19:36:17 if corvus and fungi are able to help out and mordred can drive on friday I think that gives us enough humans to give it a go 19:36:19 it's, like, one sql update query 19:36:25 and then any other infra-roots wanting to be involved can help 19:36:32 (I'll likely be floating around too just not persistently) 19:36:48 clarkb: are we past needing 3 hour reindexes now? 19:36:56 mordred: yes it should happen all online 19:36:57 if people review my updated process draft, i'll happily put together a maintenance plan based on it 19:36:58 \o/ 19:37:21 mordred: the reindex may take 12 hours, but doesn't require gerrit downtime (though does tend to block git replication) 19:37:33 its a far less expensive process 19:38:01 I'm able to help too 19:38:10 friday is weeked for ianw so we should probably let ianw weekend. frickler dmsimard any interest? If so maybe we aim for an early start? 19:38:32 dmsimard is in the same tx i am, i think 19:38:36 er, same tz 19:38:48 i can be around if earlyish, but not sure i have any special insight for this one 19:38:56 I'm on eastern (15:39 currently) and I'm leaving in one minute to pick up the kids from school :p 19:39:12 yeah, that's my tz too 19:39:19 I could watch and participate in the rename ? I'm not sure I could drive it on my own 19:39:23 1500UTC is 8am PDT and early enough that frickler can join in if interested 19:39:26 brb 19:40:00 wfm 19:40:01 clarkb: well on a friday thats pretty late for me already, so I'd rather skip that 19:40:05 proposal: 1500UTC Friday if that works for mordred (as driver) and others that volunteered to help 19:40:07 frickler: ah ok 19:40:10 with only two repos to name - it SHOULD be easy - even with github clicking 19:40:13 if the revised plan is good, it doesn't actually require zuul downtime just a very brief (automated( shutdown and startup for gerrit 19:40:17 frickler: I'm thinking it would be good to do it when corvus is awake since it is our first with zuulv3 19:40:23 so hopefully people won't be impacted too much 19:40:29 fungi: which means probably no earlier than 1500UTC 19:40:47 yeah, just plan it according to your needs 19:40:59 yay tab complete failures 19:41:02 clarkb: 1500UTC wfm 19:41:03 1500utc is fine for me 19:41:07 i'm good with whatever time you like. i have no life 19:41:18 ok lets say 1500UTC Friday then 19:41:27 and I'll go ping release team after this meeting 19:41:44 thanks everyone it will be nice to put this behind us 19:41:47 sounds good, thanks! 19:41:56 #topic General Topics 19:42:09 ianw: it would probably be good to get a recap on all of the afs changes that have happened 19:42:36 ok 19:42:52 oh, yeah, i'm sure i missed at least some of the details 19:43:08 first thing was i updated mirror-update to use backported bionic era 1.8~pre5 packages 19:43:25 that is in https://launchpad.net/~openstack-ci-core/+archive/ubuntu/openafs-1.8-xenial 19:43:58 that actually seemed to get things very stable. all reprepro's ran for about 3 days without failure 19:44:27 we found one weird warning, which after discussion seems harmless, and i've proposed https://gerrit.openafs.org/#/c/12964/ 19:45:13 after that, i updated all our fileservers to be running with the settings as suggested by auristor and documented in https://review.openstack.org/540198 19:45:26 nice 19:45:38 oh good that did merge 19:45:45 auristor also mentioned that we should be possibly concerned that our afs servers aren't running as new a version as the client? 19:46:05 or maybe i misread 19:47:00 yeah, as i understand it, there's just a mismatch between resources in terms of threads and callbacks, with 1.8 having more, that may result in lower performance 19:47:24 which is why we increased resources per that docs change right? 19:47:28 oh, right, that the 1.8 client can overrun the number of requests the server is able to accept 19:47:46 okay, so the config change does address that concern? 19:48:05 clarkb: i think we were just really underspecced to start with, using the defaults. we may want to tweak things as we move on, now we have the idea that we should 19:48:14 fungi: that was my understanding. Basically increased the number of server side resources to match the client side increases 19:48:20 excellent 19:48:24 ianw: gotcha 19:48:28 i've been fiddling with getting some of the stats out of various tools and possibly sticking them into graphite, so we can at least see where we might have issues 19:48:42 that's super helpful, thanks! 19:49:03 and the last thing is the afs docs jobs, which AJaeger pointed out can be quite unstable, seeming to hang 19:49:23 ianw: those jobs will use the older afs packages on the base distro too right? 19:49:40 we had a bit of back and forth over what this might be, simultaneous vos releases which corvus pointed out wasn't an issue, etc 19:50:19 it's writing from the executor i think, and it seems the rsync hangs? 19:50:37 it's wrapped under a significant amount of layers of ansible, etc 19:50:53 oh right and we updated the afs and kernel on the executors 19:51:02 maybe we need to roll them all the way up to your new 1.8 packages? 19:51:25 anyway ... i will keep an eye on this. maybe new settings will help. otherwise i think we need to keep digging to understand the timeout a bit better 19:51:38 sounds good 19:52:16 Another topic is cloud changes. dmsimard is in the process of bringing up limestone networks' cloud for use as test resources and we are testing out new flavors on vexxhost which require boot from volume 19:52:33 pabelanger: ^ vexxhost seems stable now ya? we just had to switch to using raw images? 19:52:40 it is! 19:52:52 we have 10 nodes, and boot-from-volume with 80GB volumes 19:52:54 pabelanger: any indication if the jobs are happier with the new flavors yet? 19:53:07 clarkb: they seem to be, I haven't seen any timeouts yet 19:53:12 nice 19:53:17 I also think the new mirror is helping with that too 19:53:27 i haven't heard anything else from packethost since we briefed them on our consumption model, but presumably we have them on the way soonish as well 19:53:29 since we no longer have BW constraints 19:53:49 and I think the change on how ipv6 works in limestone will help us bootstrap more quickly there 19:54:07 dmsimard: I'd be curious to know if you have further mirror booting issues with ipv6 there 19:54:18 yah, I plan to confirm with mnaser again how many nodes we can get in vexxhost again 19:54:26 clarkb: (back from school), yes, I'll give it another try 19:54:32 ianw: any linaro cloud updates? 19:54:47 ianw: I guess at this point its adding limited set of jobs to use the arm nodes? 19:54:58 i released dib 2.12 with all the required changes, and released nb03 from the emergency file 19:55:11 not sure if builds have happened yet, but will keep an eye there too 19:55:42 other than that, no, i think use as people want. i have some things i want to do with devstack support, like etcd versions etc 19:55:52 s/no/yes/ 19:56:22 ianw: might want to double check I used the right username / password for nodepool clouds.yaml. I think I did it correctly 19:57:34 really quickly before our hour is up 19:57:39 #topic Open Discussion 19:57:46 pabelanger: yeah, i think i did yesterday, lgtm 19:58:01 I have a patch up to restore the nodepool runtime graphs https://review.openstack.org/553718 19:58:02 anything else? 19:58:09 I think https://review.openstack.org/554624 will allow us to 2 stage a gerrit install on review-dev01.o.o 19:58:11 but failing to find per-provider metrics 19:58:15 help appreciated 19:58:16 #link https://review.openstack.org/553718 restore nodepool runtime graphs 19:58:18 hopefully will pass tests 19:59:26 #link https://review.openstack.org/554624 prereq for booting review-dev01 20:00:14 corvus may know about finding per provider metrics. frickler I often find reading the source is the easiest way to sort that out:/ 20:00:20 and with that we are at time. Thank you everyone 20:00:22 #endmeeting