19:02:09 #startmeeting infra 19:02:10 Meeting started Tue Feb 25 19:02:09 2014 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:13 The meeting name has been set to 'infra' 19:02:20 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:02:28 #link http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-02-18-19.01.html 19:02:44 o/ 19:02:46 I'm going to reorder the agenda a bit based on priority and what we did/didn't get to last week 19:02:59 it's last-week-part-two 19:03:06 #topic Ongoing new project creation issues (mordred) 19:03:16 this one is important so it gets a spot at both meetings :) 19:03:37 mordred: you have been looking into this... want to talk about your findings? 19:03:39 wow 19:03:41 I'm fancy 19:03:44 yeah - so a few things 19:04:00 first, I want to refactor manage-projects so that it's a set of smaller special-purpose tools 19:04:11 like a tool for making gerrit projects, and a tool for making github repos 19:04:21 nice 19:04:28 create_cgit_repos is a good example of what things shold be 19:04:44 along with that ,I think it'll make it easier to unittest 19:05:02 so that we can, you know, have some confidence that the tools at least are doing what we expect 19:05:07 sounds very unixy 19:05:11 once taht's sorted (or maybe in parallel) 19:05:21 I think that we shodl stop running manage-projects with puppet 19:05:27 and should instaeda run it with salt 19:05:58 I have that on my agenda 19:06:05 so that we can express something like "please run these repo creation commands on the cgit mirrors, then run these gerrit project creation commands on the gerrit server" 19:06:10 and have bee working on setting up a gerrit 19:06:18 and am making some progress 19:06:20 anteaya: I'll be sure to work with you on that once I get closer to it in my mind-brain 19:06:40 I also just mentioned the overall plan to jesusaurus as he knows a bit of salt, so perhaps between the three of us we can knock it out 19:06:53 anteaya: mordred: you may want to pick up where i left off with utahdave on that too 19:06:56 #link https://etherpad.openstack.org/p/Salt-Event-System 19:07:00 so my understanding is that we have three major problems: 1) multi-host coordination 2) github api limits 3) it's broken for empty projects for some unknown reason 19:07:05 salt helps with 1 19:07:25 presumably we can solve 3) with the refactor and better testing 19:07:35 right. so - for 2) I have two thoughts 19:08:13 one is that we can just try less with github - maybe stop setting description fields and homepages, and/or only setting no-wiki/no-downloads on repo creation 19:08:20 which would cut down on api calls, but also remove features 19:08:23 the other idea 19:08:28 is to make a local concept of state 19:08:42 so that manage-projects records what it believes it has successfully done 19:08:55 and then it can more sanely choose to only to needed thigns in the future 19:09:08 it would mean it would not re-set github toggles if someone went behind its back 19:09:18 maybe even have it confirm that what it thinks it has done has actually been done before it records that state change, just for belt and braces 19:09:19 but I think I'm ok with that, since the incidence of that should be VERY low 19:09:25 fungi: ++ 19:09:30 sounds reasonable 19:09:38 or - possibly more specifically 19:09:40 what about proxying all of the github communication and reducing it to the required 19:09:44 this could just be local state from the github tool 19:10:06 and rate limiting via the proxy 19:10:20 how does the proxy know if a command is needed? 19:10:26 clarkb: the problem with that i think is just that we make way, way, way too many github calls, unless you think the proxy could sanely cache those responses for a time 19:10:28 is it a rate limit we're hitting? we have ways of dealing with that 19:10:39 mordred: more I am thinking that every 15 minutes we set the description 19:10:52 proxy can say no I set that yesterday drop 19:11:00 right. that's why I was thinking we just rework the script to stop doing the naive thing 19:11:02 or, queue it up such that it eventually happens 19:11:15 i think the amount of effort we invest in working with github should be minimal 19:11:19 jeblair: ++ 19:11:23 jeblair: yes rate limits and ++ to that 19:11:24 sorry, have some problems with my bouncer 19:11:25 reading scrollback 19:11:43 5k requests per hour 19:11:52 SergeyLukjanov2: that makes me think that you're having problems with the guy who keeps unwanted patrons out of your nightclub 19:11:54 clarkb: that just seems like a lot of requests 19:12:23 jeblair: yeah - I think we're trying WAY too hard to keep github up to date 19:12:26 oh... 19:12:41 clarkb: right. currently one full m-p run seems to use over 1k requests, so we tend to only hit it when testing the script at the moment, but might encounter it more in production over time 19:12:44 mordred: anyway, i think we have the high level requirements: do less with github but keep features. maybe being smarter about using the api but also keeping state can be solutions. 19:12:44 mordred, heh, I mean irc bouncer ;) 19:12:51 jeblair: ++ 19:13:08 jeblair: one approach or the other may make more sense as we dive into code 19:13:14 yup 19:13:20 mordred: fungi: don't forget the pull reqest closer 19:13:32 (which doesnt' seem to work at all) will factor into the request numbers 19:13:44 clarkb: i can't forget it--i have an open shell under the account now trying to diagnose what's wrong with the github api that's breaking it 19:13:44 clarkb: well... funny you should mention that 19:13:57 so i think as a team we need to decide what the current situation is for new projects, and what we need to communicate 19:14:10 I was actually thinking that that could be re-written as a hook we register with github 19:14:22 because i'm personally confused as to which kinds of projects we should or should not create now 19:14:32 mordred: if you want to write that, i'll stop troubleshooting the current vague issue we have with it 19:14:48 mordred: doesn't that mean we have to run a web service? 19:14:56 I think we can create them whenever an infra person feels like running manage-projects by hand - but that it's an as-time-allows sort of thing 19:15:12 jeblair: yes, it does a callback mechanism similar to the old jenkins stuff 19:15:17 jeblair: I was thinking about a hook that would ping zuul ... it's not a well formed thought 19:15:43 mordred: okay, let's set that aside for a bit and talk about current m-p 19:15:48 jeblair: ++ 19:16:12 i'm not comfortable running it because it breaks a lot and there isn't a defined process for how to fix it 19:16:24 how to fix the resulting mess it leaves, i mean 19:16:26 I'm not sure it breaks when we run it by hand, does it? 19:16:40 all the times I've tried to trigger breakages when running it it's worked fully 19:17:02 mordred: none of the 3 problems i stated seemed to be affected by whether or not its run by hand 19:17:10 mordred: did i miss that nuance or is there a 4th? 19:17:34 1) is avoided when running by hand because the cgit puppet usually runs well before you get around to running it 19:17:54 mordred: how is that possible? isn't it just as likely that it would run on gerrit first? 19:17:56 2) is avoided becayse I'm lazy when running by hand and only run the project in question, so don't do a full run 19:18:05 jeblair: nope. it's not m-p on cgit 19:18:28 i refer to the whole system 19:18:39 I mean - if we go back to having it turned off in puppet 100% 19:18:48 no cron - no trigger 19:18:53 got it, that makes sense 19:18:59 but leave the create_cgit_repos in ... 1 is prolly fine 19:19:10 and 2 is prolly fine, because we'll be running per-repo 19:19:31 3 - oy. I got nothing for 3 right now 19:19:32 so that leaves "it breaks for new projects with empty repos". i assume that happens regardless? 19:19:36 yeah 19:19:47 fungi: can you confirm or fail-to-confirm that assertion ^? 19:20:32 guess not 19:20:40 jeblair: i have witnessed it when run under the puppet exec (and we have tracebacks) 19:21:10 i haven't tried to reproduce that failure mode running outside of puppet since we only witnessed it once we had reenabled automatic puppeting 19:21:14 fungi: what's the solution to manually fix it? 19:22:08 jeblair: delete all the on-disk repos and rerun it a couple times, which leads me to believe it could still be a race of some sort in gerrit group or acl processing which we only hit when we don't import a project (that could be affecting the timing?) 19:22:24 all the on-disk broken repos for the project in question, that is 19:22:52 the traceback is a vague "failure" from gerrit when running the create-projects api call 19:23:01 and there's nothing helpful in gerrit's logs that i could find 19:23:22 it sounds like if we wanted to limp along, we can do so by manually running manage-projects (with possibly needing the fix fungi describes) 19:23:27 could we just ban completely empty projects for now to avoid 3? 19:23:37 mordred: i would like to avoid that 19:23:47 k 19:23:48 fungi: I am not convinced it it related to the group or acl things since, empty repo creation is handled entirely by gerrit 19:24:01 mordred: i personally don't ever want to say "you are required to use github in order to use our system" 19:24:22 jeblair: yes. I completely agree 19:24:27 fungi: but I need to do more debugging myself to understand what happens 19:24:36 clarkb: agreed. it's a stab in the dark. could be an internal gerrit issue which is fixed in later releases 19:24:38 so if we're going to run it at all, then i think we need to just eat the cost of empty repos ourselves until it's fixed 19:25:34 i'm personally ok with saying "no new stackforge projects until this is fixed". but the manual process sounds tolerable, and we might be able to learn how to fix problem 3. 19:25:40 so i'm okay with that too. 19:26:19 mordred, clarkb, fungi: since the 4 of us will have to manually run this thing: do you want to continue to accept new projects and run by hand as discussed, or would you prefer a moratorium? 19:26:46 i think if we're going to stop accepting new stackforge projects for a while, we should make a fairly broad announcement 19:27:06 jeblair: I'm fine with running manually when one of us feels we have time - but we should make it clear to folks that we're probably going to batch them up or something 19:27:08 fungi: i agree, and a clear one that states the problem and when we expect it to be fixed 19:27:33 "new project friday"? :) 19:27:39 :) 19:27:40 mordred: batch up ++ 19:27:55 otherwise for some (unfortunately large number of) contributors this is their first commit and interaction with gerrit/infra, so doing through all that effort only to be told to go away is not going to give them a great impression of our processes 19:27:56 oh - you know what? 19:28:08 how about we request that people put the topic "new-project" 19:28:13 so that we can find new project requests 19:28:19 mordred: ++ 19:28:22 mordred: ++ 19:28:35 that'll make them easy to batch review and batch apply 19:28:36 i think that would help 19:28:57 so, batch manual creation weekly? 19:29:02 heh... new project fridays 19:29:03 on fridays? 19:29:06 ++ 19:29:09 I can send a quick email - but we should probably put a note in teh stackforge docs 19:29:12 about it 19:29:16 (about teh topic) 19:29:23 sure. and then i'll go out and drink heavily after i'm done running (and cleaning up after) it 19:29:27 ++ 19:29:40 #agreed new projects should have the topic "new-project" 19:29:40 #agreed we will batch create new projects weekly on fridays 19:29:54 okay, final thing 19:29:57 fungi: if you get a chance can you direct me to the puppet logs of failed non seeded project creations? 19:30:04 fungi: I will take a look at them to get more eyes on it 19:30:05 who's going to fix this, and when do we expect it to be fixed? 19:30:16 clarkb: i think i pasted it into the open bug, but i'll confirm 19:30:41 i'd like to include that in the communication around this as well, becuase this is still a service degredation. 19:30:56 and also, i don't actually want to agree to run them manually if it's not going to be fixed 19:31:14 i completely agree with that sentiment 19:31:14 jeblair: I'll fix it - I've started poking at the code anyway 19:31:27 and I think it's an important thing to have properly automated 19:31:54 mordred: can you drag me along, I would like to learn 19:32:00 anteaya: yup 19:32:03 thanks 19:32:12 mordred: ok. can you estimate when you might have something working enough for us to start trying it in production? 19:32:13 ++ for batching new projects requests 19:32:30 jeblair: depends on how much hacking I can get done on the plane on friday 19:32:51 can I give you an estimate on that next meeting? I don't think the code split/clean is THAT much work 19:33:17 btw, I think that I can help with manually creating projects if someone teach me / grant permissions... if it sounds useful 19:34:23 mordred: yeah, but the salt thing is more complex and vague 19:34:34 jeblair: yes. that is true 19:34:39 mordred: part of the problem is that we've been saying 'oh we should have salt run m-p' for years 19:35:07 yes, at least since last year 19:35:09 so i want to have an end date for this 19:35:12 jeblair: well, step one is to get the scripts split up. step two is to be able to run _Those_ scripts in the proper sequence by hand to verify that it works as expected 19:35:22 step three woudl be to have salt run the sequence of thigns we ran by hand, yeah? 19:35:25 jeblair: ++ 19:35:28 to end date 19:35:48 let's call it a drop-dead of mid-march 19:36:05 I think it's entirely reasonable to get it done by then 19:36:26 okay, so we should have it working by ttx's ski vacation 19:36:41 so that we can create the new ttx-goes-skiiing project 19:36:42 wow. I love that we all know when ttx goes on vacation 19:37:07 #agreed new system ready for production-testing by mid-march 19:37:19 hehe 19:37:32 mordred: you do remember I am gone for the second week of march 19:37:50 anteaya: nope. I do not remember that. let's try to get it done sooner then :) 19:38:01 okay, so i think we should send out an announcement that says to expect delays in new project creation through the end of march (let's give us time to deal with issues we find with the new system) 19:38:13 nod 19:38:15 mordred: okay I will find you 19:38:16 and it should also mention the topic name, and that we will batch create on fridays 19:38:19 I can send that 19:38:28 sounds good to me 19:38:41 #action mordred send new-project service degredation annoucment 19:39:11 mordred: thanks! i'm very excited we have a plan 19:39:26 anything else about this? 19:40:02 #topic Pip 1.5 readiness efforts (fungi, mordred) 19:40:25 I can jump right in if we want to still consider this tied to tox 19:40:34 i can't remember if i've done anything new on this. mordred? 19:41:27 looks like we've had new and unfortunate activity on clarkb tox pull request 19:41:32 yes 19:41:35 link? 19:41:39 #link https://bitbucket.org/hpk42/tox/issue/150/posargs-configerror 19:41:50 #link https://bitbucket.org/hpk42/tox/pull-request/85/fix-command-expansion-and-parsing/activity 19:42:06 the first was linked from the second (the latter being his pr) 19:42:15 so I hopped over to #pylib yesterday and spoke about this 19:43:01 there are a couple issues at hand as I gain broader understanding of tox. Basically prior to 1.7.0 posargs were special in the command processing and worked the way we want them but all other variables worked as posargs does today 19:43:13 so they made tox more consistent in less flexible and backward incompatible ways 19:43:31 my change goes the complete opposite direction and tries to make them consistent and more flexible in backward incompatible ways 19:43:46 but since my change comes second it is apparently not ok that it is backward incompatible 19:44:14 ronny's suggestion is to do a different type of variable substition instead for what we want 19:44:28 but also would like holger's feedback when holger is back from vacation in a week and a half 19:44:55 what is the fallout if we do a openstack-tox branch? 19:45:04 i think we were going to call it xot 19:45:14 I like botox 19:45:15 or just bring back run_tests.sh 19:45:32 upstream is being receptive though 19:45:39 intoxicated 19:45:43 I think they understand the problem now and appreciate how 1.7.0 broke them 19:45:48 s/them/us/ 19:45:49 (that's a bit much to type though) 19:45:58 clarkb: will there be a way to be compatible across that boundary 19:46:15 fungi: the command would be "drink" 19:46:16 sdague: yeah by introducing a new feature that does sane argument parsing 19:46:30 clarkb: ok 19:46:31 dhellmann: you have a good point 19:46:50 clarkb: remind me why pip 1.5 might be dependent on this? 19:46:50 so - quick question - 19:47:05 jeblair: we get pip via tox 19:47:10 jeblair: because tox 1.7.0 bundles pip 1.5 so as long as we use tox 1.6.1 we use older pip 19:47:14 gotcha 19:47:17 if we want a tox version which includes pip 1.5 or later 19:47:19 that 19:48:13 we do have a few potential workaround if we want to use tox 1.7.0 19:48:30 we can stop using python setup.py test and go back to the small shell snippet 19:48:41 we can update python setup.py test to treat posargs as testrargs 19:48:47 basically do what tox does in setup.py 19:49:04 both of these changes will work with tox 1.6.1 and 1.70 19:49:53 I still maintain tox is broken and my arg parser is better >_> but I don't see me winning that battle 19:49:54 I'd rather do the second thing - but I think I failed last time I tried doing that 19:49:54 but both of them require updating all branches of all repos 19:50:01 jeblair: correct 19:50:03 but will require changes to all projects (or can we shim that with pbr?) 19:50:04 because the args string is actually parsed by setuptools and then handed to us 19:50:32 clarkb: what gets passed to tox in the broken version? 19:50:44 mordred: an argv with posargs in it 19:50:57 which is then appended to the argv built from the command 19:51:19 so it doesn't get shlex'd in the way you might expect, the literal argv array is appended to the argv apssed to exec 19:51:25 yeah 19:51:28 gotcha 19:51:31 its a cheap way to ignore escaping rules but isn't very flexible 19:52:13 clarkb, mordred: do you want to make a decision now, or wait a few weeks for holger? 19:52:29 I thnk we shoudl wait a few weeks 19:52:30 I am tempted to wait for holger so that we are good citizens 19:52:39 I don't want to touch pip during ff period anyway 19:52:45 good point 19:53:07 so we're probably actually looking at what to do for J at this point? 19:53:16 yes 19:53:42 * fungi concurs. slushy from this point onward i think 19:53:53 #agreed wait for holger to weigh in on tox changes; target pip 1.5 for Juno 19:54:01 slushy slush 19:54:22 #topic Open discussion 19:54:57 i don't think we have sufficient time for any other major items, so anyone have any smaller items to mention? 19:55:16 there is some pypi-mirror refactoring up 19:55:28 fyi I've started working on extracting common jobs to templates for layout.yaml, starting from requirements check and pypi release 19:55:30 new elasticsearch cluster is on the way, maybe by my dinnertime 19:55:42 any ETA on az2 so i can run a working build? 19:55:57 zaro: i haven't heard anything new 19:55:58 that YorikSar helped me make better than I was making it - which was all driven by trying to re-think how pypi-mirror deals with errors and stuff 19:56:59 also the change to no longer download wheels in pypi-mirror merged yesterday, and we're getting complete mirroring once more 19:57:19 yay 19:57:21 fungi: it may take longer than that :P depends on how long data ends up replicating to new nodes allowing us to shut off old ones (we only have one node of redundancy so can turn everything off all at once) 19:57:29 oh, and i've proposed we stop running grenade on havana a week early 19:57:31 fungi: woot 19:58:25 fungi: ++ 19:58:30 thanks all! 19:58:31 #endmeeting