19:03:20 #startmeeting infra 19:03:21 Meeting started Tue Feb 2 19:03:20 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:23 \o 19:03:26 The meeting name has been set to 'infra' 19:03:32 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:37 #topic Announcements 19:03:52 o/ 19:04:16 o/ 19:04:34 hrm, i thought i'd saved a link for this, just a sec 19:05:21 okay, here we go. sorry about that 19:05:35 #info Mentors needed for GSOC 19:05:40 #link http://lists.openstack.org/pipermail/openstack-dev/2016-February/085508.html 19:06:12 we've had some great infra interns/mentees in the past, and it's a great way to get in touch with potential new additions to the team, to openstack, and to free software in general 19:06:38 i encourage people to give it a try, especially if they've never mentored. it's a rewarding experience 19:06:57 anyway, no other announcements lined up. anything important i should mention before we move on to action items? 19:07:39 #topic Actions from last meeting 19:07:43 #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-01-26-19.02.html 19:07:47 cody-somerville to draft and send HPE Cloud shutdown notice+impact to openstack-infra and openstack-dev 19:07:51 #link http://lists.openstack.org/pipermail/openstack-dev/2016-January/085141.html 19:07:56 thanks cody-sommerville for sending that! 19:08:00 No problem. 19:08:01 thanks everyone who worked on maintaining the former hewlett-packard cloud for our use and abuse! 19:08:02 thanks cody-somerville 19:08:08 and thanks to the rest of the infra team for making the removal well-planned, quick and painless! 19:08:14 yes thank you 19:08:15 We've gotten some folks who are interested in donating resources. Is someone following up with them? 19:08:39 who did you contact about that? 19:08:44 there was one which emerged form the infra ml moderation queue yesterday, and i was planning to reply but haven't had time yet 19:08:53 s/form/from/ 19:09:01 obviously I have backscroll to read 19:09:05 I am currently working with osic for credentials 19:09:10 yay 19:09:20 harlowja says they will have internal disucssion and may have something for us 19:09:29 wonderful 19:09:36 clarkb yup 19:09:46 gonna go poke the SE guy here who said he wanted to chat with me about this 19:09:47 #link http://lists.openstack.org/pipermail/openstack-infra/2016-January/003707.html 19:09:56 "safebrands" 19:10:19 though odd that the offer came from their head of marketing 19:10:39 yay safebrands 19:10:42 we have logos on a marketing page now :) 19:10:48 yay 19:10:51 yes we do! 19:10:53 anyway, no need to eat up meeting time with this one 19:10:57 so much progress 19:11:05 nibalizer release gerritlib 0.5.0 19:11:07 #link https://pypi.python.org/pypi/gerritlib 19:11:11 o/ 19:11:16 thanks nibalizer for picking that up after i promised to do it and then dropped it on the floor! 19:11:27 yay gerritlib! 19:11:45 i didn't see any fallout from the gerritlib release, so smooth sailing there i guess 19:11:58 #topic Specs approval 19:12:03 PROPOSED: Unified Mirrors (krotscheck, jeblair) 19:12:06 #link https://review.openstack.org/252678 19:12:11 looks like this was discussed as i'd hoped last week, though council voting was deferred for an additional week 19:12:20 #info Voting is open on the "Unified Mirrors" spec until 19:00 UTC Thursday, February 4. 19:12:39 \o/ 19:12:50 we've got a status update on the agenda for this already, so i'll avoid spending much time on the spec announcement 19:12:51 i have also pushed up the afs modification of that https://review.openstack.org/273673 19:13:14 cool. everyone let's vote on that too by thursday if possible 19:13:15 (which is what's actually in production now) 19:13:20 #link https://review.openstack.org/273673 19:13:47 #topic Adding a new node to nodepool to support libvirt-lxc testing in Nova (thomasem, dimtruck) 19:14:04 any chance thomasem or dimtruck are around this week to discuss what they wanted here? 19:14:11 fungi: So, this turned out to be a bug that was filed here: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1536280 19:14:12 Launchpad bug 1536280 in linux (Ubuntu) "domain shutdown fails for libvirt/lxc" [Medium,Confirmed] 19:14:43 #link https://launchpad.net/bugs/1536280 19:14:49 We're disabling the tests affected right now to just get some testing for LXC working, and I'm iterating tests here: https://review.openstack.org/#/c/274792/1 and investigating the other failures. 19:15:02 #link https://review.openstack.org/#/c/274792/1 19:15:24 I wouldn't make a special new node just for this. If newer kernel fixes it we can use that across the board for example 19:15:32 or use centos/fedora until ubuntu can fix 19:15:43 thomasem: so from the nova mid-cycle last week I got that you want to use a specific kernel for your test 19:15:52 yeah, trying on centos 7 was one of my earlier suggestions as well 19:16:05 or fedora 23 19:16:22 the latter should have a pretty bleeding-edge kernel i think 19:16:26 clarkb: fungi: Good call, I wasn't sure what our options are regarding that, but if we can just specify a different kernel, I would be amenable to that. I haven't tried on other OSes 19:16:49 I'm using 3.18.x in my environments and that works great 19:16:58 thomasem: ubuntu LTSs have the hardware support kernels which we havne't had to use previously but are available to us 19:17:12 i'm a little iffy on having our test environment for ubuntu 14.04 lts use a nonstandard kernel, and having jobs rely on that 19:17:20 fungi: its "standard" 19:17:23 fungi: it just isn't default 19:17:26 but if it's a packaged kernel in updates that's fine 19:17:46 yep, no disagreement from me on that 19:17:49 So, you would prefer a different OS entirely? I think that's why we were considering a different node at the time. That's not the biggest problem right away, though. The biggest problem is the other intermittent failures that I don't have a root cause for yet. 19:18:00 Is there a specific kernel patch that is needed? Maybe we can get them to include it in the LTS kernel. 19:18:03 as long as it doesn't also come with new and improved blow up all our other jobs support 19:18:15 Hahaha, yeah 19:18:21 fungi: right that is the risk and why centos7/fedora23 would be preferable to start probably 19:18:43 To avoid Ubuntu node kernel changes affecting everything else? 19:18:44 thomasem: centos is 3.10 ... but probably heavily modified ... f23 is 4.3.3 19:18:49 thomasem: yes 19:18:53 Gotcha 19:18:56 thomasem: i can help you with setup of either 19:19:01 thomasem: ovs, qemu, etc 19:20:11 ianw: Gotcha. At the moment I'm trying to just get this thing passing consistently for the tests that aren't affected by the kernel problem, if that makes sense. But, once we get that part solved, I would be happy to explore other node types that can open up the breadth of tests we can run on LXC reliably. 19:20:20 clarkb: fungi ^^ 19:20:23 does that all seem reasonable? 19:20:38 okay, so sounds like there are at least a couple of good paths forward without adding a special "ubuntu with seasoning" node type 19:20:42 makes sense 19:20:47 thomasem: yup makes sense 19:21:24 Okay, awesome. ianw, I will hit you up if I run into snags, does that sound good? I really appreciate the aid. 19:21:32 thomasem: need anything else debated in the meeting, or does this transition to the infra channel/ml and code review next? 19:21:57 fungi: Nope. I think we have a path forward, and if anything starts going wrong, I'll start screaming again. 19:22:05 :D 19:22:09 excellent! we can finally get this off our meeting agenda backlog ;) 19:22:14 woot woot 19:22:18 #topic Scheduling a Gerrit project rename batch maintenance (SergeyLukjanov) 19:22:19 I don't think this constitues as screaming 19:22:30 i do enough screaming for all of us 19:22:40 I've never seen that happen 19:22:47 okay, last week it was decided to kick the ball down the road because lots of people were travelling 19:22:52 anyway sorry to derail 19:23:21 is SergeyLukjanov here? 19:23:49 I'm not excited about this, I'll play along but can't drive 19:23:49 there are a couple of pending renames for official repos. SergeyLukjanov was offering to run the maintenance as long as there are at least some of us around in case something goes awry 19:24:07 this weekend is an unofficial holiday in this country 19:24:18 foosball tournaments 19:24:21 I can be around this weekend but again not in a hurry 19:24:23 so I am mostly not around 19:24:48 i will not be partaking in said tournament mania so can be available... 19:25:13 but also these renames don't seem terribly urgent and there aren't many of them 19:25:28 I'm here if you need me 19:25:33 also this would be our first renames since the gerrit upgrade, right? 19:25:36 yes 19:25:50 oh, so probably worth a bit more attention than usual 19:25:54 so might be best to defer yet another week and have more people available 19:25:57 good point 19:25:58 yeah 19:26:06 i can be around on mornings in my timezone, that will be the same as Sergey. But cannot be around in the afternoons 19:26:21 yolanda: thanks, good to know 19:26:29 I'm off again the weekend of the 12th 19:26:39 until the end of Feb 19:26:49 so not available for rename things 19:27:08 i guess i'll try to circle back around with SergeyLukjanov about a possible window for next week, and we can pick a time when we meet again 19:27:23 yup 19:27:25 #topic Mirror update (jeblair, krotscheck) 19:27:32 you're all afs admins now 19:27:39 ha ha ha 19:27:42 WOOO 19:27:43 Wait 19:27:48 sadpanda 19:27:57 all == infra-root, yes? 19:28:02 afs administration as code! old meets new! 19:28:21 the pypi mirrors are in production and on afs 19:28:34 we're still keeping the old ones around for just a bit in case something goes terribly wrong 19:28:35 anteaya: all == anyone contributing patches and looking at whatever we can expose on dashboards and graphs 19:28:39 Wheel mirror work is still in progress. 19:28:44 fungi: ah okay thanks 19:29:06 in general, the theory about it being fast by serving from local cache seems to be holding 19:29:08 anything special we need to know when reviewing? 19:29:29 it does turn out that trans-atlantic udp is quite slow 19:29:48 jeblair: are we going to set up a replica in europe? 19:29:50 so when the ovh mirrors fetch something from the fileserver, it takes a bit longer 19:30:00 clarkb: if we did, it may improve that ^ 19:30:05 AJaeger: probably for now, being aware that pip is installing from a cached backend for the pypi mirrors in our jobs is a good place to start 19:30:49 but also, even within the us, when we transfer hundreds of gb between data centers to make the read-only replicas, that turns out to be quite slow too 19:31:00 in theory there should be no impact, but be on the lookout for oddities in jobs which could be explained by stale caches, cold caches, cache misses 19:31:16 we need faster-than-light packets 19:31:23 thanks, good question AJaeger 19:31:23 fungi, ok, we learn as we go ;) 19:31:25 our initial sync of the pypi mirror to a new read-only site took much longer than i guessed, and ended up getting aborted 19:31:41 i started another initial sync last night in a safer manner, and expect it to finish wed night 19:32:04 i'm going to manually release it after that a few times until i'm happy that the deltas are reasonably small and fast 19:32:12 then we can switch back to automated releases 19:32:37 and then, some time in the future, maybe we can look into whether there's anything we can tune to make this faster 19:32:53 (cern measures their afs throughput in gbps) 19:32:54 jeblair: the expectation is that only the initial sync is slow right? 19:33:10 clarkb: yeah, it's an incremental system so should speed up considerably 19:33:11 doesn't seem terrible as a startup cost, really 19:33:43 consider that bootstrapping a pypi mirror from scratch with bandersnatch takes a similarly long amount of time 19:33:49 [end of my report] 19:33:51 indeed :) 19:34:17 except in this case we (in theory) incur that cost once now instead of for every new mirror server we create 19:34:18 If we start doing replication clones, I assume we'll cluster a couple together to avoid having to repay that setup cost in event of failure of one of the nodes? 19:34:46 right now we have 2 fileservers, in rax dfw and ord 19:35:12 i don't think that needs to change in the immediate future 19:35:29 as it's actually the mirror servers (which are afs clients) that are doing the real local caching 19:35:37 and they are in each region we have nodepool slaves 19:35:43 * krotscheck1 is more or less done with the wheel_mirror patches, excepting some typos and cleanup. 19:36:03 * krotscheck1 is waiting for local tests to pass before uploading a (hopefully final) patchset 19:36:31 yeah, i think those are ready once we get the incantation right :) 19:36:51 awesome 19:36:51 i created the first wheel volume in afs and mounted it 19:36:55 * krotscheck1 is fresh out of goats, will be switching to tofu sacrifices. 19:37:02 so all the externalities have been satisfied 19:37:10 krotscheck1: its squishy 19:37:17 Most of the system-config patches should be ready though. 19:37:24 It's only the job defenitions that are pending. 19:37:35 jeblair: Do we already have a wheel slave? 19:37:54 krotscheck1: no 19:37:57 fungi: right? 19:38:04 and we will need one for each platform 19:38:11 or otherwise chroot/container 19:38:16 clarkb: right, though we're starting with only ubuntu for simplicity 19:38:18 no wheel slave built yet afaik 19:38:33 so yes, that's an upcoming step 19:39:22 jeblair: that makes sense 19:39:41 so thrilling progress with afs in production! this also makes for a great segue into our next topic 19:39:54 unless there are more afs/wheel mirror questions 19:40:08 none here 19:40:37 #topic Swift for docs publishing (annegentle, fungi) 19:40:43 #link http://specs.openstack.org/openstack-infra/infra-specs/specs/doc-publishing.html 19:41:11 there was some renewed interest from the docs team in recent weeks/months on this spec 19:41:17 i have prepared thoughts on this, sorry for the bomb. 19:41:24 note that the spec also has an afs alternative section: 19:41:24 http://specs.openstack.org/openstack-infra/infra-specs/specs/doc-publishing.html#afs 19:41:24 it says to do this, we'd have to set up an afs cell which is a lot of work 19:41:24 but we have an afs cell now 19:41:24 so it's probably worth revisiting 19:41:25 originally, i imagined that to have it work securely on throwaway nodes, we would need zuul to do some complicated stuff with creating principals, pts entries, etc 19:41:25 that's still more work than i'd like to do in zuulv2, but it's possible, and may be less work than the rube-goldberg approach in the spec 19:41:26 however, if we are willing to be a little less paranoid, and trust doc build jobs with afs creds on long lunning slaves (like we chose to do with mirror wheel builds), we could get docs into afs with rsync _really quickly_ 19:41:32 really what it's mostly lacking now is some available hands to work through implementation 19:41:40 oh, heh. reading 19:41:57 and yes, basically what i was about to say 19:42:09 thanks for saving me the typing! 19:42:20 :) 19:42:52 so the takeaway there is... if there are people who are really amped about afs, this is a great spec to jump on 19:43:28 Would it really need AFS credentials? Or just normal prived ssh key to do the build+publish workflow? 19:43:54 wow, nobody wanted to just teach sphinx how to upload things to swift? 19:43:56 i would expect to see a revision of the current approved spec which takes the afs details into account of course, but aside from that i agree the work is much simplified if we go down that path now 19:44:14 and proxy docs.openstack.org to swift? 19:44:15 cody-somerville: logs are simpler, so i proposed doing that for that spec. docs are _much harder_ because of the layout 19:44:20 SpamapS: see above 19:44:26 SpamapS: i was equally shocked ;) 19:45:10 Also, wondering if there is benefit to dogfooding OpenStack service instead of relying on custom infrastructure here (though the point about layout is definitely fair) 19:45:37 we have different branches writing into basically the same tree (in a predictable/structured manner), so it can not simple be copied, or even blindly rsynced 19:46:15 * annegentle waves 19:46:32 basically, the only way i know to get what we need is a careful rsync (copy is right out because you can't be smart about deletions) 19:46:36 our dogfooding of said storage solution for log publication has run into some snags, mostly around the browsing experience and need for real filesystem-like metadata too. while i agree we should avoid reinventing the wheel, this wheel was invented decades ago and is still nice and round 19:46:36 yeah it's the "blindly rsynched" that's tough here 19:47:26 What if each build is published in unique location/namespace/whatever and then there is an atomic "update to point at latest"? 19:47:31 * annegentle catches up on afs cell stuff... 19:47:44 Yeah, for an immediate solution, seems like just "better things behind the current solution" is going to have to win out over "making swift better" 19:48:42 cody-somerville: that's vaguely what the spec accomplishes -- it's sort of "build this as a unit, and try to drop the unit in place (with rsync)" 19:48:45 cody-somerville: I'd have to think about that... we only "release" the install guides and config refs to a known /relname/ URL right now, and then the contrib dev docs also have "releases" 19:48:58 One should be able to build a site entirely hosted in swift. But if there's time pressure, sounds like can't dogfood it. 19:49:12 cody-somerville: then guides like Ops Guide, Security Guide, those are namespaced to cover multiple releases... 19:49:18 well, the docs and logs use cases turned out to differ in a couple of key areas. logs: huge volume, need to track and possibly prune by age, need to generate indexes on the fly; docs: (comparatively) small quantity of data, comes with own indexing pregenerated 19:49:20 cody-somerville: so yeah it's vaguely like that :) 19:49:27 SpamapS: that's where we started with this, but the actual requirements are almost completely opposite of what swift provides 19:49:44 jeblair: how disappointing. :( 19:49:50 SpamapS: ha. Yeah I wondered how much the reality had moved on with a 1.5 year old spec jeblair 19:49:58 That may explain why I am always puzzled as to why somebody didn't use swift. Maybe it just isn't for what I think it's for. 19:50:04 so that's also a point to discuss, is that spec reflecting reality? 19:50:29 Honestly, it comes up mostly when Google finds an outdated doc that's still on the server because we don't delete. 19:50:38 annegentle: i think the spec could still be implement as written; my personal feeling is that we'd get it done faster if we pivoted to afs (which is the big thing that has moved on since the spec was written) 19:50:42 the other big win is HTTPS on docs. and developer. 19:50:48 Like, I also think anything that is "a mirror" should be hostable in a thing like swift. But nobody seems to do that, so there must be something about it that just makes that really hard. 19:50:56 jeblair: ok, that's good to know and exactly why I'm asking :) 19:51:04 ++ on https for docs 19:51:10 SpamapS: IKNOW :) 19:51:17 kidding on the shout :) 19:51:19 SpamapS: noone disagrees ;) 19:51:24 SpamapS: well, the docs "site" is built in bits and pieces with different processes modifying overlapping parts of the tree at different times 19:51:25 I'm just concerned with AFS becoming a hard dependency of the CI system. 19:51:29 and i think either way, we can add https 19:51:36 jeblair: true 19:51:48 so persistence and malleability are needed for the current state of docs.o.o 19:52:30 SpamapS: no public access except via cdn and rudimentary indexing ability are the two killers for us 19:52:41 also swift doesn't seem to be designed as a filesystem, rather as a filestore, and you still need to maintain external indexing (which is where we're still struggling with the log storage effort) 19:53:31 lack of hierarchical indexing, specifically (or any built-in hierarchy at all really) 19:54:10 if folks aren't sick of hearing about afs, i can propose a spec update and describe what that would look like 19:54:24 and here i was just about to ask who wanted to take that as a next step 19:54:25 jeblair: I'd like to see your spec update proposal 19:54:30 i'd like to know whether we're comfortable with semi-trusted doc build jobs 19:54:53 I'd like to read about what makes them semi-trusted 19:54:56 yeah we can discuss swift's flaws another time. +1 on afs from me, but with the caveat that I'm just a curious party, not a working party, in this context. 19:54:57 i can also write it up both ways i guess 19:54:59 it's no worse, trust-wise, than the status quo so no objection from me for now 19:55:33 spec update is progress to me, and ensures we're still moving towards https and decent synch 19:55:37 "good enough synch" 19:56:05 ok, i'll do that, and we can accept or reject that approach 19:56:14 jeblair: thanks 19:56:18 np 19:56:21 excellent! and also a timely topic in conjunction with afs usage recently going into production for our mirroring 19:56:39 #topic Open discussion 19:56:46 jeblair: will doc publishing have access to the vos command? 19:57:52 pabelanger: any last-minute updates on the upstream development presentations ideas for the summit? did you get anything submitted? cfp deadline is in a few hours 19:58:10 cody-somerville: not directly, too high priv. possibly as a follow-up job like we're doing for wheels 19:58:21 FYI: The new Translation is nearly finished, the unified approach works fine. Now amotoki and myself are cleaning up and reenabling all repos again. Then it's figuring out horizon. 19:58:33 Current set of changes: https://review.openstack.org/#/q/status:open%20%20branch:master%20topic:translation_setup 19:58:49 Only project-config ones are mandatory - reviews are welcome 19:59:15 fungi, pabelanger sumitted the lightning talks for sure 19:59:15 AJaeger: yay. :) 19:59:37 thanks pabelanger! 20:00:01 okay, we're out of time 20:00:08 thanks all 20:00:10 #endmeeting