19:03:34 #startmeeting infra 19:03:35 Meeting started Tue May 10 19:03:34 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:39 The meeting name has been set to 'infra' 19:03:42 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:49 #topic Announcements 19:04:12 #link http://lists.openstack.org/pipermail/openstack-dev/2016-May/094540.html Newton Summit Infra Sessions Recap 19:04:19 (as promised) 19:04:29 #topic Actions from last meeting 19:04:53 #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-05-03-19.02.html 19:05:01 jeblair investigate mail filtering by recipient options for infra-root@o.o inbox 19:05:17 did you come to any conclusion on that topic? 19:05:22 fungi: i was not here for that meeting 19:05:27 o/ 19:05:44 though i do recall discussing it at the summit 19:05:47 it was carried over from the meeting you were at 19:05:47 i will start looking at it soon 19:05:58 (I thought) 19:06:08 at any rate, no :) 19:06:18 yeah, this was carried over from an action item at the meeting before the summit. anyway, cool, i'll carry it forward ;) 19:06:24 i've been... busy traveling 19:06:29 #action jeblair investigate mail filtering by recipient options for infra-root@o.o inbox 19:06:39 o/ 19:06:45 #topic Specs approval 19:07:17 we don't have any new ones yet, though the wip on on task tracking is probably going to show up ready for council vote next week 19:07:38 it's linked in my recap 19:07:39 is that the one that thierry proposed? 19:07:41 yeah 19:07:54 o/ 19:07:58 * AJaeger has one spec up to mark as completed... 19:08:10 AJaeger: what's the link? i'll approve 19:08:23 https://review.openstack.org/#/c/305426/ 19:08:35 thanks 19:08:43 #topic Priority Efforts 19:09:03 thanks, fungi 19:09:26 since i didn't get around to posting the summit recap until today, let's give this one more week for discussion to conclude before we revisit priorities for the coming cycle 19:09:49 We're pretty close to closing out 'Use Diskimage Builder in Nodepool' 19:09:52 #topic Golang infra needs for swift (mordred, notmyname) 19:10:08 i've seen the ml thread! 19:10:12 wheee 19:10:13 so notmyname and I were talking a bit 19:10:14 this is exciting stuff 19:10:40 about what would need to be done from an infra pov to support go in things 19:10:52 and thought it would be a good idea to chat with folks here - cause I probably forgot things 19:10:54 o/ 19:11:13 is there any kind of list started with items? 19:11:18 like an etherpad? 19:11:19 investigation and implementation around dealing with mirroring/caching dependenices is the big one I can think of 19:11:44 there are smaller issues like what to do with docs building (likely a combo of sphinx and godoc) 19:11:45 do we have any initial jobs that exercise go compilation/testing? 19:11:46 from my perspective, I'm looking for questions -infra needs answers to and info swift can provide to -infra as this process moves forward 19:11:49 anteaya: not really 19:11:54 mordred: okay 19:11:56 I have a question on reproduceability: If golang builds download unversioned tarballs, how can we ensure that everybody can reproduce the build? Do we need to limit what gets included in a build? Mirror it? 19:11:56 but is probably a good idea 19:11:57 ie stuff I need to be working on putting together 19:12:10 AJaeger: so - golang deps are git repos essentially 19:12:18 which means there is no central registry we can mirror 19:12:25 there is a tool called godep 19:12:42 that can be used to make a dependency manifest similar to what npm and rust have 19:12:57 yeah, not being familiar with how go dependencies are ultimately declared, i have to assume they let you specify a git ref or something for reproducibility 19:13:03 and if we used that, we could do a thing similar to our old pypi mirroring scheme to mirror the things we depend on 19:13:07 they do 19:13:18 you can reference shas or tags or whatever 19:13:39 but this is completely new territory, so there is only a basic design from the dark corners of my brain, and we all know how that tends to go 19:13:58 * AJaeger prefers released code instead of random git repos but this might be a way to go... 19:14:04 I think the mailing list also brought up supported versions of go - but to me that seems clear to be "whatever is in xenial" 19:14:06 presumably the discussion of unversioned dependencies was more in regard to a lot of go libs not actually tagging anything, so people using git shas instead a lot of the time? 19:14:22 fungi: yah. that seems to be a non-abnormal practice 19:14:40 anybody have any additional things they can think of that we should point notmyname at? 19:14:41 mordred: what about build artifacts? 19:14:54 might consider inviting monasca folks in for extra info/help on defining this stuff 19:15:07 jeblair: that's an excellent question 19:15:18 so far we're not in the business of publishing built binaries 19:15:22 rockyg: they are welcome to attend any meeting or participate in any mailing list 19:15:28 and source tarballs don't really seem to be a thing in go lnad 19:15:33 openstack should probably be a "good" upstream and not be one of those who don't tag releases, at least 19:15:33 noone needs an invitation 19:15:35 so our "releases are tags" might work "well" 19:15:51 my biggest concerns are avoiding dep hell 19:15:55 since Go is pretty notorious for it 19:16:03 any worse than node/npm? 19:16:07 fungi: much supposedly 19:16:11 mordred, notmyname: so when swift tests with hummingbird, it will need to build hummingbird first? 19:16:18 fungi: you end up doing things like making a new repo for major version changes to libs 19:16:21 fungi: for example 19:16:48 clarkb: I think I actually like that better 19:16:48 jeblair: yes, definitely for functional tests. honestly i don't know how unittests work in a compiled language yet 19:16:48 fungi: it works great in googles one tree world but falls apart quickly when you have many trees 19:17:01 clarkb: that sounds more like someone avoiding making tags 19:17:12 fungi: well its beacuse there is no way in Go to respect "tags" 19:17:13 clarkb: it's certainly better than the hipster-semver world view of "I bumped the major version why are you angry that I broke everything" 19:17:16 fungi: or wasn't until every recently 19:17:24 mordred: oh well most Go projects do that 19:17:31 mordred: the smart ones make a new repo can call it foo2 19:17:51 notmyname: in rust at least, part of running unittests is compiling the project, and then also compiling the unittests against the project 19:17:54 anyways there is a pretty famous rant out there on how it is impossible to ever reproduce builds of Go at stages in the past 19:17:57 I should dig it up 19:17:58 notmyname: so I'd expect similar in go 19:18:13 interesting. so anyway we should probably be prepared for the "list" of dependencies to change with some frequency if creating new repos for releases is a common pattern 19:18:17 clarkb: so this is one of the problems godep solves 19:18:33 clarkb: with godep, the godep tool handles the git cloning and stuff 19:18:50 and makes a local "vendor" tree that we would not commit to our git repos 19:19:02 mordred: transitively too? AIUI the issues had to do with now we are 5 levels deep and someone never published the exact version of bar lib they needed and now we have to check all 5k commits to bar lib until something works 19:19:03 but is a reproducable tree as it does respect refs and tags 19:19:05 yah 19:19:07 clarkb: yes 19:19:16 one of the things I saw Tim Bell bring up was the ability for operators to use distro packages 19:19:31 mordred: ok sounds like they have addressed hte major issues then 19:19:37 so would we be testing using distro packages? 19:19:40 mordred: I think another issue was anyone can delete that repo from github and now yo ulose 19:19:42 in theory they still can use distro packages of this. distros are packaging projects written in go 19:19:43 anteaya: yah- I think as long as we have the ability to do reproducible builds, then the distros should be able to do that too 19:19:53 mordred: or delete the tag or rebase and force push changes ot history 19:19:58 mordred: okay thank you 19:20:00 but those were less common grumps 19:20:07 perhaps we should make our "source tarballs" include the 'vendored' code 19:20:08 reproduceable builds are hard if they contain any compiled binaries 19:20:10 in any case ... 19:20:15 we almost certainly want to take some cues from what the distro pain points with go are so that we can find/follow conventions for avoiding those 19:20:28 The only distro who has it as a priority right now is Debian IIRC. 19:20:30 this is one of the big/hard tooling design issues that needs to be addressed 19:20:40 bkero: its also part of doing CI 19:20:45 and we're probably not going to solve it right now 19:20:48 bkero: we support software for 18 months right now 19:21:01 I think it's more important for us to come up with a list of things that need to be solved before we're comfortable with such a thing 19:21:11 clarkb: I'm talking byte-for-byte reproduceable. Compilers and system libraries just don't do it yet. 19:21:11 so to get the discussion a little back on track, what are the questions we have specifically for the swift/hummingbird devs? 19:21:26 bkero: I am talking build release x.y.z 9 months from now 19:21:32 fungi: they have the question from us "what do we need them to help us work on solving" 19:21:36 s/from/for/ 19:21:41 i don't want this to turn into a rathole about go language shortcomings specifically 19:21:45 o/ 19:21:48 like, we know there will be an infra cost that someone is going to need to solve 19:21:52 * krotscheck totally forgot about this. 19:22:06 but we need to flesh that out so that someone can go work on solving it 19:22:12 mordred: swift/designate should own the devstack/d-g changes to properly install their thing 19:22:13 mordred: yeah, i think as a brainstorming session, we've identified the high points; could probably do with writing them up next. 19:22:31 clarkb: "d-g"? 19:22:33 they should decide on a common dep handling machinery 19:22:36 notmyname: devstack-gate 19:22:56 notmyname: it's the thing that sets things up for devstack 19:23:05 is devstack a requirement? 19:23:13 ie for the testing image? 19:23:17 * persia notes that none of the other toolchains (npm, python, C) are currently entirely reproducible, and suggests that such discussion is better done at a toolchain level than an OpenStack level 19:23:29 notmyname: we use devstack for integration tests 19:23:33 notmyname: so, yeah 19:23:54 anteaya had a good idea earlier ... so: https://etherpad.openstack.org/p/golang-infra-issues-to-solve 19:23:55 mordred: yeah, but there's also the concept of different servers in one gate job (eg rolling upgrade tests) 19:23:57 how about we collect things 19:24:05 mordred: thank you 19:24:21 notmyname: yah - multi-node devstack jobs is the usual solution to that currently 19:24:41 mordred: not that I'm opposed, just figuring that this might be an opportunity for "if we could do it over again, we'd do ..." 19:25:03 swift's functional tests also use devstack-gate for setup, right? 19:25:11 fungi: yes 19:25:14 yeah IIRC 19:25:17 with a post gate hook 19:25:31 notmyname: I would really rather we not reinvent the entire wolrd all at once... 19:25:36 :-) 19:25:44 so basically something that adds in the "build hummingbird and run it" step into the current integration and/or functional jobs 19:25:44 notmyname: Go is a new thing is a big enough transition 19:25:56 clarkb: ++ 19:28:03 I think my biggest concerns are 1) that designate and swift will decide on two different ways to do things and 2) that infra will get all the work dumped in its lap 19:28:34 happy to help, we do have a lot of experience dealing with lang devtest envs, but I don't think qa or infra should have to update devstack for example 19:28:43 those "things" are what I want to see a list about, so that we don't do them differently 19:29:05 which unfortunately is a list of mostly unknowns until we encounter them 19:29:19 i expect a lot of this will be learned through trial and error 19:29:21 right, I think the best thing we can say is swift and designate will work together to reach common decisions 19:29:28 rather than try and figure it all out upfront 19:29:29 ok 19:29:54 clarkb: +1000 to that 19:29:54 agree with clarkb 19:30:00 we can't really serve as the arbiter between swift and designate teams. luckily they're great people who will have no trouble cooperating toward a common goal 19:30:18 thanks notmyname! 19:30:37 anything else urgent on this topic before i move on? 19:30:47 thank you for your help, -infra 19:30:57 I kinda expect to keep needing it for this :-) 19:31:04 :) 19:31:07 notmyname: thanks for coming and chatting 19:31:15 i'll keep up with the etherpad and work there 19:31:22 #topic Publish logs for mirror-update.o.o (pabelanger) 19:31:28 ohai 19:31:47 so, we had a request from the release team a few days ago to expose some of our mirror-update logs. 19:31:48 anything sensitive in them we should worry about? can we just do teh same trick we're doing for nodepool image builds? 19:32:01 i think they are long and boring :) 19:32:16 the only sensitive thing I can think of is the kerberos auth creds 19:32:21 Ya, I think what we do for nodepool.o.o is a good start 19:32:21 it's much more likely what the release team would actually be interested in is status info 19:32:51 also, we could just write them into afs :) 19:33:01 great point 19:33:13 I would be okay with that 19:33:14 which will then be served via the mirror hosts for easy access 19:33:14 (logs or status details. or even both) 19:33:15 I like that 19:33:47 are there some good afs stats we would want to periodically dump to a file in each volume? 19:34:02 do we already have an updated timestamp file? 19:34:10 * mordred approves of writing them to AFS 19:34:19 I want to say reprepro gives a summary we could write out 19:34:24 fungi: vos examine will give you updated timestamps for RO and RW volumes 19:34:25 that's probably the one thing they want to know more than anything: "when did this volume last get updated" 19:34:26 bandersnatch might? 19:34:30 yah - reprepro keeps info in a db 19:34:50 so we can write out mirror content infos and volume status infos 19:35:14 (also, vos examine needs no creds, so anyone can run it anonymously) 19:35:48 okay, neat. 19:35:53 on the other hand, writing vos examine details out to a file periodically or something does avoid them needing to install a kernel module to find that out 19:36:12 no idea if that's a show-stopper for them 19:36:20 fungi: no kernel mod needed 19:36:39 oh? 19:37:28 ahh, right, just the afs client and krb5 19:37:38 don't even need krb5 19:37:44 http://paste.openstack.org/show/496612/ 19:37:48 here's the output 19:37:50 the npm mirror actually includes a status file that's updated with every run. 19:37:52 mordred: kinit! 19:38:04 krotscheck: woot! 19:38:07 http://mirror.dfw.rax.openstack.org/npm/-/index.json 19:38:21 indeed, that's simple 19:38:26 so you can see the last update time for both the RW and the RO volumes (they are the same, so RW has not changed since the last vos release) 19:38:57 Both the date and the two seq parameters are relevant on that file. 19:39:37 centos has updated since the last vos release: http://paste.openstack.org/show/496613/ 19:39:57 okay, so the decision is to start by writing the mirror-update logs into a subdirectory of their respective volumes? 19:40:03 Would it make sense to create a single directory with symlinks to the files so that the release team has a single place where they can look? 19:40:11 fungi: i'd write to a different volume 19:40:30 ahh, and then serve it under a different docroot/alias 19:40:31 probably just a readwrite volume with no replica 19:40:47 oh, right, we serve the parent of the volumes from apache anyway 19:40:58 yup 19:41:02 krotscheck: and yeah, could be organized like that 19:41:23 have a directory for each mirror type with appropriate acls 19:41:43 * krotscheck could write a pretty UI Dashboard. 19:42:11 krotscheck: with afs that usually means uing "========", but i'm guessing you mean something else :) 19:42:22 :-P 19:42:24 -------- 19:42:26 ? 19:43:04 since this is devolving into restructuredtext jokes, we're probably ready for the next topic? 19:43:18 I mean, let's be honest here. We're one blog post away from other large orgs using our mirrors internally. 19:43:18 I am 19:43:26 sure, I think we have a plan to start 19:43:40 krotscheck: Let's not start to discuss team blogs ;) 19:43:40 krotscheck: ++ 19:43:44 pabelanger: ++ 19:43:53 #topic Trusty upgrade schedule (fungi) 19:44:14 i'm squeezing this in so we don't run out of time, since we punted on it last week without enough people around to decide 19:44:38 we said r16 was probably the best time to arrange to burn through the trivial upgrades. did anyone else have any opinions? 19:44:54 what's that in the old calendar? 19:44:57 http://releases.openstack.org/newton/schedule.html 19:44:58 last time I sort of threw out R-19 19:45:17 and my recollection was R-19 as well, May 23-27 19:45:18 R-16 is june 13-17 19:45:32 d'oh, right 19:45:36 i had them backwards 19:45:44 r19 was the emerging consensus 19:45:54 r-19 is what I remember 19:45:55 I'll be away in moose land thurs-friday of that week, but happy to help mon-wednesday 19:46:06 I'm around that week 19:46:17 pleia2: watch out for the meese 19:46:23 Ya, monday doesn't look good for me, but good rest of week 19:46:29 yeah, i expect this is going to just be dogpiling on the easy ones for some of the week and also trying to plan out the more involved upgrades (which may end up in subsequent separate weeks) 19:46:39 fungi: nods 19:46:40 i am around those dates 19:46:48 so being around for part of that week should still be plenty of opportunity to help knock some out 19:46:52 i should be around then 19:46:53 * AJaeger will be offline R19 and R20 19:46:58 I will be semi around, turns out I might actually be buying a house R-20 19:47:07 AJaeger: for a well deserved holiday! 19:47:13 better than buying a moose 19:47:15 clarkb: congratulations 19:47:25 yeah moose aren't house trained 19:47:31 fungi: hard to say which is more trouble, really 19:47:43 okay, any objections to r19 (may 23-27)? 19:47:43 can I get a vodka drinking bear? 19:47:50 fungi: let's do it 19:47:50 ask mirantis 19:48:00 * mordred will be on vacation may 23-27 19:48:03 fungi: no objections from me 19:48:07 mordred: yay 19:48:08 but has no objections 19:48:25 I will attempt to not help 19:48:34 I may be unsuccessful 19:48:35 #agreed Mass server upgrade to Ubuntu 14.04 will ensue the week of May 23-27 19:49:16 we can discuss further as the date draws closer 19:49:29 I'll see if I can book the sprint channel 19:49:29 #topic Tier 3.14159265359 support (pabelanger) 19:49:33 so we can work in there 19:49:37 thanks anteaya 19:49:44 welcome 19:49:48 okay pabelanger, what is this about? do you want us to get you a pager? 19:49:48 So, this is a results of our zuul issue that happen on friday, see http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-05-06.log.html#t2016-05-06T12:27:39 for some history 19:50:17 but TL;DR, I was hoping we could have some sort of out-of-band contact list for infra-root. 19:50:31 i have a lot of opinions on this, but will boil them down to one or two 19:50:40 So, a chance for yolanda and myself to wake up everybody? ;) 19:50:58 anteaya: did a great job keeping the openstack-infra channel working, but I was looking to get some additional active from infra on how to proceed with the zuul issue at hand 19:51:18 generally, when our root admins are available to assist troubleshooting/fixing something (i.e. not sleeping or on vacation) we're in irc keeping an eye on what's happening 19:51:36 well after fungi finishes I have some thoughts to share on the topic, and thanks for the acknowledgement pabelanger 19:52:16 when the only root admins around are unfamiliar with the thing that's broken, that's an unpleasant experience but it's how most of us have had to get deeper familiarity with these systems in the past 19:53:12 And we have some times where no root admin is around - and have so far accepted that there is an occasional hickup that takes a few hours... 19:53:14 Right, for me, I think I was just looking for validation to move forward with the plan at hand (restarting zuul). But, yes that was related to inexperience 19:53:29 anteaya: thoughts? 19:53:35 so while we should be working to improve our important service documentation (always) i don't know that calling someone else to jump on irc is going to force that learning experience 19:53:42 well my thougts are that you handled the situation really well 19:53:46 as a data point, email or irc to me is usually just as good if not better way to contact me than phone calls or text messages. I don't tend to let my phone interrupt me and instead poll it like I poll email/irc 19:53:51 and that it wasn't a crisis 19:54:01 but I can understand the pressure you felt 19:54:13 and that no is sometimes the best answer to a situation 19:54:14 i treat my phone similarly. i often forget it in another room of the house, sometimes for days 19:54:18 people don't like to hear it 19:54:29 but no can actually save folks time and trouble in the long run 19:54:43 pabelanger: I think you did everything right in the situation 19:54:53 we can't expect folks to have experience they don't have 19:54:57 that is unrealistic 19:55:10 we have to operate within the limits which we are comfortable 19:55:10 Sure, I understand the need to not have contacting outside of IRC. And happy to leave it at that 19:55:15 anteaya: thanks 19:55:16 so anyway, while collaboration with our peers is important, when there is an emergency try to carefully read documentation if it's available and then proceed with caution and remember that we all make mistakes 19:55:20 and then grow them with the opportunity arrises 19:55:24 pabelanger: welcome 19:55:51 pabelanger: and have the courage to slow down and not panic - as you did. 19:55:55 i have destroyed my share of our systems over the years. it's probably one of the main ways i've learned about them 19:56:04 ;) 19:56:08 it's also possible we've reached a point where we should revisit the level of service we're offering or striving to offer - and document it 19:56:45 mordred: I think that is a good idea 19:57:00 mordred, ++ 19:57:05 ultimately any decison to move forward is up to the person who will have to fix it afterward 19:57:14 and that is personal and different for everyone 19:57:19 yup 19:57:20 i'm cool with that too, though i think we do a remarkable job of not letting the world burn around us already and documenting that we do that in an ad hoc manner with no off-hours escalation paths is likely to just frighten some people 19:57:34 fungi: it might 19:57:48 but i'm okay with us writing that down nonetheless 19:57:51 fungi: but it might be better to frighten them in a proactive manner - than in a reactive manner when they panic 19:57:58 oh I'm not interested in frightening people 19:58:18 * jeblair assumes the escalation path is the same as everything else in tech -- twitter 19:58:19 but I agree with fungi's statement, as a team we do an awesome job 19:58:21 That isn't a bad thing: it is done extremely well 19:58:22 "we'll get to it" can sound like we're blowing someone off in the midst of a crisis for them - which is not always what we're doing - but we don't always have the time to explain that in th emoment 19:58:40 also agree with fungi 19:58:57 jeblair: ack, that's how I contacted mordred 19:59:00 we have a few more topics we're not getting to, (gerrit disk utilization, project renaming, vexxhost control plane hosting) so i'll put those first on next week's agenda 19:59:11 thank you 19:59:15 my joke isn't funny anymore :( 19:59:20 Setting expectations means that you almost always surpass them. A good thing. 19:59:31 jeblair: sadly accurate actually 19:59:33 pabelanger: fwiw, I did not see that tweet until well after the situation was over 19:59:45 mordred: pew 19:59:58 so for the record, you contacted me via IRC :) 20:00:00 okay, thanks everybody! see you all back in #openstack-infra 20:00:05 #endmeeting