19:02:19 #startmeeting infra 19:02:19 Meeting started Tue Apr 1 19:02:19 2014 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:20 so it is! 19:02:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:24 The meeting name has been set to 'infra' 19:02:26 Lurking, too :-) 19:02:26 o/ 19:02:37 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting 19:02:37 more new faces/nicks 19:02:39 agenda ^ 19:03:03 #link http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-03-25-19.01.html 19:03:05 last meeting ^ 19:03:09 looks like it is all jeblair on the agenda 19:03:12 #topic Actions from last meeting 19:03:13 * anteaya sits back 19:03:21 jeblair cname puppetboard.o.o and delete puppet-dashboard.o.o server and dns 19:03:24 i half did that ^ 19:03:31 the cname exists in dns now 19:03:49 o/ 19:03:58 #link http://puppetboard.openstack.org/ 19:04:06 i haven't deleted the old server yet though 19:04:15 #action delete puppet-dashboard.o.o server and dns 19:04:18 #action jeblair delete puppet-dashboard.o.o server and dns 19:04:39 jeblair propose organizational cleanup repo renames to infra and then tc lists 19:04:43 i also half-did that ^ 19:04:57 I see puppetdb.o.o still works 19:04:59 in that i proposed it to the infra list, where we made some changes 19:05:00 keep that? 19:05:15 anteaya: yes it has to, the nodes are configured to report there 19:05:23 * anteaya nods 19:05:23 so i still need to send a revised version to the tc 19:05:36 #action jeblair send revised repo rename list to tc 19:06:04 #topic Dealing with puppet changes (jeblair) 19:06:13 o/ im here! 19:06:15 o/ 19:06:40 okay. so we have puppetboard now, which means that theoretically we can land puppet changes and if they break stuff, it's not solely up to infra-root to deal with the carnage 19:07:04 but i think yesterday we saw that we might need a bit more process around making that actually happen 19:07:05 hooray 19:07:19 since we broke puppet for most of the day, and the infra-root folks fixed it by reverting most of the changes 19:08:06 part of the problem is that puppet manifests can be broken in certain ways which prevent them from being served and subsequently prevent reports from going into puppetdb 19:08:26 * mordred thinks that's a bug in puppet, btw 19:08:30 I think this is a particular special case, and involves changes to the master 19:08:46 so the only outward signs we'd have would be of things not happening like we expect, and eventually nodes showing up as stale in puppetboard (after a day or more) 19:08:54 puppet not being able to compile something seems like a thing you'd want reported :) 19:09:36 it seems like we may want to have someone agree to watch puppetboard for success when we land puppet changes 19:09:45 that person doesn't have to be infra-core per se of course 19:09:50 perhaps we should tune down the 24-hour limit before nodes show up as stale 19:10:08 to help mitigate this too 19:10:13 ++ 19:10:22 and there is no way to increase the number of reports we keep, right? 19:10:33 but i do agree that having someone watch puppetboard for a report which shows the intended change got applied successfully would be a huge help 19:10:43 mordred: not yet; apparently that's being worked on 19:11:04 epecially for things where it's clear that we want to watch it - such as the nibalizer/AaronGr hiera refactors 19:11:08 nibalizer: do you know if we can lower the 'stale' timeout as above ^ ? 19:11:28 basically the way i spotted it yesterday was that after i approved something which puppet should have updated and went to puppetboard to see when i should expect it to be working, i saw that the server in question hadn't reported for 2 hours 19:11:50 mordred: so maybe our process for +A major puppet changes is that we make sure someone is on-deck and watching? 19:12:01 jeblair: the 'unresponsive' you mean? 19:12:04 took me a few minutes to realize that the timestamps on the latest reports were waaay too old to be sane 19:12:04 thats a configuration options 19:12:05 nibalizer: yes 19:12:10 easy to change 19:12:12 cool 19:12:16 excellent! 19:12:34 we should be able to set that to, what, 30 mins and be okay? 19:12:57 that would mean three missed reports in a row, so sounds fine to me 19:13:26 ++ 19:13:35 I can push that change up today 19:13:44 nibalizer: thanks a ton 19:13:47 no prob 19:13:51 so how about when a change hase 2 +2 votes, we leave a comment saying "ping an infra-core person in #openstack-infra when you are ready to monitor puppetboard for any adverse effects from this change" ? 19:13:57 jeblair: ++ 19:14:07 also, i would like a non-puppet task 19:14:18 and we wait for the driver of that change to ping us, so they drive the scheduling 19:14:23 since i love the puppets, but thats not havin me branch out much 19:14:27 nibalizer: we have plenty of those to choose from 19:14:31 jeblair: sounds good 19:14:45 fungi: where is the task list? and can you pick one thats good to 'warm up' on? 19:14:57 #link https://bugs.launchpad.net/openstack-ci/+bugs?field.tag=low-hanging-fruit 19:15:02 nibalizer: ^ 19:15:06 oh perfect 19:15:14 some hang lower than others, but there's plenty to pick from 19:15:39 #agreed infra-core will ask authors of major puppet changes to "ping an infra-core person in #openstack-infra when you are ready to monitor puppetboard for any adverse effects from this change" before approving 19:15:40 jeblair: your plan sounds perfect to me 19:15:58 yup sounds good 19:16:16 +1 19:16:47 #action nibalizer propose change to lower puppetboard 'unresponsive' timeout to 30 mins 19:17:41 cool; as you can tell from my wild swath of approvals, i'm very excited about getting these changes in. :) i hope this makes it smoother. 19:18:05 anything else on the topic? 19:18:09 can't make an omelette without breakign a few chickens 19:18:32 speaking of sacrificing poultry... 19:18:35 #topic Using storyboard (jeblair) 19:18:46 YES 19:18:57 storyboard is getting very close to the criteria where i think we can use it... 19:18:58 (is all i have to say to that) 19:19:10 was afk, sorry, /me agreed re "[23:15:39] #agreed infra-core will ask... " 19:19:12 jeblair: ++ 19:19:16 cool 19:19:23 in case you missed it, that criteria, in my mind, solely consists of "it should be able to like track bugs and stuff". 19:19:26 jeblair: are we going to bother with a migration of infra bugs? 19:19:38 jeblair: or just start using it naturally and let things die in launchpad 19:19:47 I assume we will need a migration for the other projects though 19:19:59 I believe we will definitely need a migration for other projects 19:20:03 that's more or less what i had in mind for this topic -- how, exactly, would we like to start using it. 19:20:06 are we have a soft opening? 19:20:07 i'm in favor of the attrition model there, but it could be painful for some users 19:20:19 like having some people/projects using it? 19:20:40 anteaya: this should only extend to -infra projects; it's no where near ready for non-infra projects i think 19:20:48 we could move, say, nodepool bugs to storyboard 19:20:54 (start small) 19:21:03 * anteaya nods 19:21:06 since nodepool isn't broken out from openstack-ci on lp yet 19:21:30 fungi: good point; start with nodepool, then maybe zuul.... 19:21:33 it may be good to guinea pig a migration with infra though 19:21:43 how many people work with nodepool bugs? 19:21:49 so reserve some subset of bugs for that maybe? 19:21:57 probably, we should move more "active" bugs to storyboard 19:21:58 clarkb: zuul would be something we could potentially migrate, since it's an lp project in its own right now 19:22:04 (i think) 19:22:08 like gate bugs? 19:22:09 not that i know much about story board, but what happens to a project that uses it for openid authentication, sense launchpad provides that... does it continue to go there? 19:22:10 clarkb: I thnk that' a good idea - move infra when we have a migration script 19:22:32 wenlock: at the moment we use launchpad for openid with storyboard 19:22:32 mordred: storyboard has _all_ projects defined in it; i have a little bit of a concern with that 19:22:46 mordred: in that i'm not sure how we best convey to people where bugs should be filed 19:22:47 mordred, ahh cool 19:23:12 SergeyLukjanov: i think that gate bugs need to stay in storyboard until openstack itself moves 19:23:15 jeblair: we could disable the projects list creation and wipe the db... and/or make a second projects.yaml for storyboard for now 19:23:18 because they are processed by e-r 19:23:21 wenlock: though i gather openstackid is coming along... 19:23:25 #link http://git.openstack.org/cgit/openstack-infra/openstackid/tree/ 19:23:35 jeblair, doesyou mean in launchpad ? 19:23:44 SergeyLukjanov: yes, sorry, should stay in launchpad 19:23:48 jeblair: agreed 19:23:59 yup, got it, it's correct re e-r 19:24:04 so i think even if we move "all" of infra to storyboard, we should keep using openstack-ci on launchpad for gate bugs to facilitate e-r 19:24:40 that makes sense 19:24:42 mordred: i think that might be a good idea -- only listing projects in storyboard that use storyboard will help people keep track 19:25:10 also, if we migrate projects (eg zuul, gear), i think we can close their bug trackers on lp 19:25:23 jeblair: agree. I think that ultimately I Want one projects.yaml - but for now, having two to track the migration makes sense 19:25:34 jeblair: you want me to take that on? 19:25:42 mordred: that would be swell 19:25:47 also - krotscheck ... how important are your current storyboard bugs in storyboard? 19:25:52 a flag in projects.yaml perhaps, to define whether the project should be in storyboard (or should have a "not ready yet" placeholder on its stories or something)? 19:25:57 are we going to require stackforge projects to switch? 19:25:59 oh, you said that 19:26:15 When you close project on LP, need a message to redirect to Storyboard. 19:26:29 krotscheck: as in - if we wipe stuff from teh current db in anticipation of a move of infra things in ... 19:26:31 #action mordred make an abbreviated projects.yaml with only projects using storyboard as their primary tracker 19:26:32 jhesketh: I don't think we have ever forced a bug tracker on stackforge projects 19:26:54 right, so getting down to one projects file may be tricky 19:26:54 rockyg: yeah, worst case when we disable bugs on a project in lp we can update its main overview with a link to storyboard 19:26:56 jhesketh: however, supporting both launchpad and storyboard may be a pain so we might have to 19:26:57 krotscheck: do we need to be careful to not bork your datas? or can we just drop and re-create the db? 19:27:39 mordred: what do we need to delete? 19:27:52 mordred: we can just delete all projects that != storyboard, right? 19:28:03 jeblair: the existing 250 projects -and I thin kthere are some fake stories in there 19:28:07 jeblair: sure 19:28:12 jeblair: yeah - good point- it's not that hard 19:28:14 :) 19:28:23 in terms of supporting both launchpad and storyboard is there a plan to manage features provided by jeepyb? For example, if you have "closes-bug xyz" how do we clarify which system that is in 19:28:32 jhesketh: ultimately we don't 19:28:40 (currently it looks like story numbers are sequential) 19:28:40 mordred: what if instead of making another projects.yaml, we just stopped using it for auto-project creation and left that as a manual superuser function for now? 19:28:49 jeblair: fine by me 19:29:01 jhesketh: though i think it's something we will need--i just doubt anyone's started planning that bit yet 19:29:40 jhesketh: for example, we could s/bug/story/ in our standard commit message headers and comment links 19:29:51 if we had a "uses-storyboard" flag in projects.yaml - it would be fairly easy to do a lookup on projects.yaml to see where most things should direct themselves to 19:29:56 bit of a pain for people to remember which to use though 19:30:05 with the exception of gerrit links 19:30:10 mordred: good point wrt hooks at least 19:30:11 * ttx lurks 19:30:13 less so for links 19:30:15 yeah 19:30:34 mordred: has anyone volunteered to make a lp -> storyboard import script? 19:30:42 jeblair: I think nibalizer just did 19:30:45 ;) 19:30:50 haha 19:30:50 mordred: so if we have a 'uses-storyboard' flag we could use the one projects file and only set up storyboard for those projects? 19:30:55 mordred: that's how i read that too. ;) 19:31:00 jhesketh: yah 19:31:07 jhesketh: ++ 19:31:08 might be a two-birds/one-stone approach 19:31:29 yep 19:31:36 heh 19:31:40 only undefined thign will be gerrit comment links 19:31:45 i'm down 19:32:14 #action nibalizer write lp->storyboard migration script 19:32:23 zaro: i don't suppose new gerrit can make logic decisions in its text linking? 19:32:23 do comment links have access to project name? they don't do they? 19:32:38 mordred: they have access to the comment itself, which doesn't have the project name 19:32:43 nibalizer: there's a bunch of launchpad api using code in jeepyb you can use for reference... 19:32:48 mordred: I think we will be in a weird spot with commentlinks 19:33:00 because if we could get project-name into the data that the javascript has, we could make a javascript callout thing 19:33:00 fungi: i'm not aware that's possible. 19:33:04 nibalizer: no one has ever written a python interface to storyboard, so that'll be exciting and new. :) 19:33:11 zaro: it was wishful thinking. thanks though 19:33:14 ooo, does it have a rest api ? 19:33:19 nibalizer: it does 19:33:23 well then game on! 19:33:31 mordred: it's not very neat, but as a short-term if gerrit isn't able to handle it you could send all requests to storyboard and if the project doesn't have 'uses-storyboard' flag it can redirect to launchpad 19:33:32 not that i dont LOVE xmlrpc but.. 19:33:50 jhesketh: ooh. that's sexy 19:33:55 I LIKE 19:34:08 or some decider proxy anyway 19:34:18 doesn't have to be embedded in storyboard itswlf 19:34:25 nod 19:34:32 could that be a bit of wsgi we stack on top of storyboard? 19:34:49 yep 19:35:13 just to be clear, we can probably be broken for comment links for the zuul/nodepool move... 19:35:21 mordred: yeah, i think we can live with that 19:35:37 agreed 19:35:53 +1 19:35:58 jhesketh: can you add a story to the storyboard project about the wsgi/redirect idea? 19:36:03 it would be tolerable for a little while 19:36:11 jeblair: yep 19:37:06 okay, i think we have the better part of a plan 19:37:18 I read that word as palm 19:37:22 which is a very different sentence 19:37:55 #topic open discussion 19:38:08 I have 2 things, first is that we're very close to retiring old-wiki, of the two outstanding tasks bug 1172440 has a review up that will close it and 1127411 should be pretty trivial (moving apache configs over to static.o.o) 19:38:10 Launchpad bug 1172440 in openstack-ci "bug day stats should be in openstack-infra" [Medium,In progress] https://launchpad.net/bugs/1172440 19:38:13 reminder, grizzly is dead, dead, deadski 19:38:50 bug 1127411 19:38:51 pleia2: wow, that's hard to believe that we could ever retire old-wiki 19:38:51 Launchpad bug 1127411 in openstack-ci "Move the $project.openstack.org redirects off of old wiki" [Medium,Triaged] https://launchpad.net/bugs/1127411 19:38:56 jeblair: I know! 19:39:00 I'm excited :) 19:39:04 oh right I need to review that change again 19:39:07 deleted the stable/grizzly branches and removed the jobs/d-g support over the weekend 19:39:17 fungi: woot! 19:39:18 in case anyone asks 19:39:37 fungi: should we prep the addition of icehouse jobs yet? 19:39:38 fungi: woot! 19:40:05 my 2nd thing is that I've been chatting with Daisy, things are crazy this time of the cycle but she thinks Pootle will work, wasn't keen on the wiki idea, so she'll pick up Pootle evaluation once translations are done, hopefully we can chat about it again at the summit and start serious work at the beginning of the cycle 19:40:18 clarkb: probably not too early to get started on the config reviews for those. we might even be able to land them before the stable branches exist, not positive though 19:40:23 pleia2: there is another proposed session to talk about it again 19:40:33 clarkb: good 19:40:43 I hadn't paid attention to summit sessions yet 19:40:46 pleia2: cool, i think having someone really driving that (as you have volunteered to do!) will really help. :) 19:41:00 grenade jobs would probably have to wait, though... can't upgrade from havana to icehouse sanely when there's no icehouse branch for anything 19:41:16 fungi: ya, we should probably propose the change then WIP it for now 19:41:28 if we get that far along before stable/icehouse branches are cut 19:41:58 pleia2: i will miss our twice-a-year discussion about moving translations though, if it actually ends up happening this time 19:42:51 in other news, it just dawned on me we have another gauge of how much busier we've gotten... nobody had time for april fools changes to zuul comment strings this year :( 19:43:09 hah 19:43:10 no i tried 19:43:11 jeblair: I will have a zuul change pushed shortly. Mostly want your thoughts on the way the tests have been hacked, to make sure I haven't gotten lost in the woods 19:43:14 i just couldn't figure it out 19:43:32 i even had two pokemon gifs lined up for pass/failure 19:43:41 so much pokemon this 1st 19:43:49 oh that's what that was about ;) 19:43:55 pleia2: gotta catch 'em all! 19:43:57 nibalizer: you may have been able to do it with a commentlink change in gerrit 19:44:03 clarkb: did greghaynes make a change for his thing? 19:44:15 nibalizer: so 's/FAILURE//' 19:44:18 jeblair: no 19:44:21 er jesusaurus ^ 19:44:28 :( 19:44:29 so I have an item for the open discussion, we need to configure a swift container to push openstack logs to.. this will allow jobs to individually move over to pushing to swift rather than the big log server 19:44:34 Yes, too busy :( 19:44:58 jhesketh: I support this 19:45:01 jhesketh: jeblair: right so I think this is still the biggest unanswered question for me as far as swift goes 19:45:07 jhesketh: i can probably add one and get the right creds into hiera 19:45:09 have we fixed the CDN is terrible problem yet? 19:45:26 clarkb: we're not using a CDN 19:45:30 clarkb: what arethe issues with cdn? 19:45:32 jhesketh: i did it a while ago, and can probably muddle my way through again 19:45:32 clarkb: we don't need to use it because we'll be serving them via os-loganalyze 19:45:34 jhesketh: you have to for public access unless that changed 19:45:43 jeblair: iirc all public access is via CDN 19:45:46 I've patched osloganalyze to stream from swift 19:45:53 oh 19:45:56 huh 19:45:58 ok 19:46:20 clarkb: yeah i think if it's in the same account and region as static.o.o, that works right? 19:46:32 clarkb: https://review.openstack.org/#/c/76796/ 19:46:54 fungi: accoutn and region shouldn't matter if we give osloganalyze a way to read fro mthere 19:47:03 fungi: but I think that is as good a place to put it as any 19:47:14 fungi: reduces latency in theory too 19:47:42 ahh, okay. wasn't sure if there were additional controls in place besides just knowing the url 19:47:53 fungi: there are, pretty sure it will need credentials 19:48:07 since this will be not public access 19:48:13 right, 76796 uses them 19:48:29 see the wsgi.conf placeholders 19:48:40 yeah, so os-loganalyzer now needs a config for swift params 19:49:04 those will need to become vars in a puppet erb in the config repo 19:49:09 also, is there any concern that one container won't be enough? 19:49:13 and then we'll put whatever we generate into hiera 19:49:19 * clarkb should probably catch up on the swift stuff later 19:49:33 yep 19:49:58 clarkb: good question 19:49:59 jhesketh: right i know at one point we talked about needing to rotate containers to get around size limits or performance concerns 19:50:19 fungi: correct, particularly for listings, but osloganalyze can possible deal with that for us too 19:50:44 right, if it was just the link count in the container which was at issue, then maybe it's irrelevant now 19:51:13 yeah that'll be a lot easier to deal with if we aren't trying to figure out a single immutable public url with multiple containers in use 19:51:13 all hail osloganalyze 19:52:14 fungi: actually, maybe we should put the test logs in the test account 19:52:37 fungi: that way the resources are segragated based on role 19:52:57 static happens to serve them but they belong to the test account 19:53:08 clarkb: you mean openstackjenkins vs openstackci? 19:53:16 ya 19:54:31 hrm. i lean toward openstackci -- because we're creating the container 19:54:31 i suppose, from a bookkeeping standpoint. but from a security separation standpoint they are more closely related to the systems in the openstackci account which generate the tokens for writing to and then later read from them 19:54:56 if jenkins were creating them, i think the openstackjenkins would be more appropriate 19:55:19 fungi: because, yeah, the security separation is the main benefit there i think 19:55:30 I see the security benefit as being the inverse here 19:55:47 i see there being only a marginal security benefit regardless 19:56:02 if loganalyze gets owned we only expose the test slaves and their data to corruption/abuse 19:56:18 with the other account its everything 19:56:39 clarkb: osloganalyze doesn't have the _account_ credentials regardless 19:56:48 jeblair: how does it have access then? 19:56:54 yeah, this would be a dedicated api key, right? 19:57:10 I thought the hp patch to make ^ a thing was never merged to keysteon 19:57:35 clarkb: wow, i sure hope you can grant read/write access to swift without _also_ giving the ability to spin up servers and delete all data in other services. 19:57:36 * fungi revisits what creds are being handed to openstackwatch for writing to swift 19:59:22 fungi: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/openstackwatch.py#L121 19:59:41 so I'm not sure about the security but it's something I'll have a look into 19:59:51 I'm pretty sure swift has container acls 19:59:54 we can use http://www.rackspace.com/blog/reduce-human-error-with-role-based-access-control-for-rackspace-cloud/ 20:00:05 that are not related to the hp patch that didn't merge 20:00:16 ++ 20:00:21 #endmeeting