#openstack-meeting log

19:02:09 <jeblair> #startmeeting infra
19:02:10 <openstack> Meeting started Tue Feb 25 19:02:09 2014 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:02:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:02:13 <openstack> The meeting name has been set to 'infra'
19:02:20 <jeblair> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:02:28 <jeblair> #link http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-02-18-19.01.html
19:02:44 <zaro> o/
19:02:46 <jeblair> I'm going to reorder the agenda a bit based on priority and what we did/didn't get to last week
19:02:59 <fungi> it's last-week-part-two
19:03:06 <jeblair> #topic Ongoing new project creation issues (mordred)
19:03:16 <jeblair> this one is important so it gets a spot at both meetings :)
19:03:37 <jeblair> mordred: you have been looking into this... want to talk about your findings?
19:03:39 <mordred> wow
19:03:41 <mordred> I'm fancy
19:03:44 <mordred> yeah - so a few things
19:04:00 <mordred> first, I want to refactor manage-projects so that it's a set of smaller special-purpose tools
19:04:11 <mordred> like a tool for making gerrit projects, and a tool for making github repos
19:04:21 <jeblair> nice
19:04:28 <mordred> create_cgit_repos is a good example of what things shold be
19:04:44 <mordred> along with that ,I think it'll make it easier to unittest
19:05:02 <mordred> so that we can, you know, have some confidence that the tools at least are doing what we expect
19:05:07 <fungi> sounds very unixy
19:05:11 <mordred> once taht's sorted (or maybe in parallel)
19:05:21 <mordred> I think that we shodl stop running manage-projects with puppet
19:05:27 <mordred> and should instaeda run it with salt
19:05:58 <anteaya> I have that on my agenda
19:06:05 <mordred> so that we can express something like "please run these repo creation commands on the cgit mirrors, then run these gerrit project creation commands on the gerrit server"
19:06:10 <anteaya> and have bee working on setting up a gerrit
19:06:18 <anteaya> and am making some progress
19:06:20 <mordred> anteaya: I'll be sure to work with you on that once I get closer to it in my mind-brain
19:06:40 <mordred> I also just mentioned the overall plan to jesusaurus as he knows a bit of salt, so perhaps between the three of us we can knock it out
19:06:53 <fungi> anteaya: mordred: you may want to pick up where i left off with utahdave on that too
19:06:56 <fungi> #link https://etherpad.openstack.org/p/Salt-Event-System
19:07:00 <jeblair> so my understanding is that we have three major problems: 1) multi-host coordination 2) github api limits 3) it's broken for empty projects for some unknown reason
19:07:05 <jeblair> salt helps with 1
19:07:25 <jeblair> presumably we can solve 3) with the refactor and better testing
19:07:35 <mordred> right. so - for 2)  I have two thoughts
19:08:13 <mordred> one is that we can just try less with github - maybe stop setting description fields and homepages, and/or only setting no-wiki/no-downloads on repo creation
19:08:20 <mordred> which would cut down on api calls, but also remove features
19:08:23 <mordred> the other idea
19:08:28 <mordred> is to make a local concept of state
19:08:42 <mordred> so that manage-projects records what it believes it has successfully done
19:08:55 <mordred> and then it can more sanely choose to only to needed thigns in the future
19:09:08 <mordred> it would mean it would not re-set github toggles if someone went behind its back
19:09:18 <fungi> maybe even have it confirm that what it thinks it has done has actually been done before it records that state change, just for belt and braces
19:09:19 <mordred> but I think I'm ok with that, since the incidence of that should be VERY low
19:09:25 <mordred> fungi: ++
19:09:30 <jeblair> sounds reasonable
19:09:38 <mordred> or - possibly more specifically
19:09:40 <clarkb> what about proxying all of the github communication and reducing it to the required
19:09:44 <mordred> this could just be local state from the github tool
19:10:06 <clarkb> and rate limiting via the proxy
19:10:20 <mordred> how does the proxy know if a command is needed?
19:10:26 <fungi> clarkb: the problem with that i think is just that we make way, way, way too many github calls, unless you think the proxy could sanely cache those responses for a time
19:10:28 <jeblair> is it a rate limit we're hitting?  we have ways of dealing with that
19:10:39 <clarkb> mordred: more I am thinking that every 15 minutes we set the description
19:10:52 <clarkb> proxy can say no I set that yesterday drop
19:11:00 <mordred> right. that's why I was thinking we just rework the script to stop doing the naive thing
19:11:02 <clarkb> or, queue it up such that it eventually happens
19:11:15 <jeblair> i think the amount of effort we invest in working with github should be minimal
19:11:19 <mordred> jeblair: ++
19:11:23 <clarkb> jeblair: yes rate limits and ++ to that
19:11:24 <SergeyLukjanov2> sorry, have some problems with my bouncer
19:11:25 <SergeyLukjanov2> reading scrollback
19:11:43 <clarkb> 5k requests per hour
19:11:52 <mordred> SergeyLukjanov2: that makes me think that you're having problems with the guy who keeps unwanted patrons out of your nightclub
19:11:54 <jeblair> clarkb: that just seems like a lot of requests
19:12:23 <mordred> jeblair: yeah - I think we're trying WAY too hard to keep github up to date
19:12:26 <mordred> oh...
19:12:41 <fungi> clarkb: right. currently one full m-p run seems to use over 1k requests, so we tend to only hit it when testing the script at the moment, but might encounter it more in production over time
19:12:44 <jeblair> mordred: anyway, i think we have the high level requirements: do less with github but keep features.  maybe being smarter about using the api but also keeping state can be solutions.
19:12:44 <SergeyLukjanov> mordred, heh, I mean irc bouncer ;)
19:12:51 <mordred> jeblair: ++
19:13:08 <mordred> jeblair: one approach or the other may make more sense as we dive into code
19:13:14 <jeblair> yup
19:13:20 <clarkb> mordred: fungi: don't forget the pull reqest closer
19:13:32 <clarkb> (which doesnt' seem to work at all) will factor into the request numbers
19:13:44 <fungi> clarkb: i can't forget it--i have an open shell under the account now trying to diagnose what's wrong with the github api that's breaking it
19:13:44 <mordred> clarkb: well... funny you should mention that
19:13:57 <jeblair> so i think as a team we need to decide what the current situation is for new projects, and what we need to communicate
19:14:10 <mordred> I was actually thinking that that could be re-written as a hook we register with github
19:14:22 <jeblair> because i'm personally confused as to which kinds of projects we should or should not create now
19:14:32 <fungi> mordred: if you want to write that, i'll stop troubleshooting the current vague issue we have with it
19:14:48 <jeblair> mordred: doesn't that mean we have to run a web service?
19:14:56 <mordred> I think we can create them whenever an infra person feels like running manage-projects by hand - but that it's an as-time-allows sort of thing
19:15:12 <clarkb> jeblair: yes, it does a callback mechanism similar to the old jenkins stuff
19:15:17 <mordred> jeblair: I was thinking about a hook that would ping zuul ... it's not a well formed thought
19:15:43 <jeblair> mordred: okay, let's set that aside for a bit and talk about current m-p
19:15:48 <mordred> jeblair: ++
19:16:12 <jeblair> i'm not comfortable running it because it breaks a lot and there isn't a defined process for how to fix it
19:16:24 <jeblair> how to fix the resulting mess it leaves, i mean
19:16:26 <mordred> I'm not sure it breaks when we run it by hand, does it?
19:16:40 <mordred> all the times I've tried to trigger breakages when running it it's worked fully
19:17:02 <jeblair> mordred: none of the 3 problems i stated seemed to be affected by whether or not its run by hand
19:17:10 <jeblair> mordred: did i miss that nuance or is there a 4th?
19:17:34 <mordred> 1) is avoided when running by hand because the cgit puppet usually runs well before you get around to running it
19:17:54 <jeblair> mordred: how is that possible?  isn't it just as likely that it would run on gerrit first?
19:17:56 <mordred> 2) is avoided becayse I'm lazy when running by hand and only run the project in question, so don't do a full run
19:18:05 <mordred> jeblair: nope. it's not m-p on cgit
19:18:28 <jeblair> i refer to the whole system
19:18:39 <mordred> I mean - if we go back to having it turned off in puppet 100%
19:18:48 <mordred> no cron - no trigger
19:18:53 <jeblair> got it, that makes sense
19:18:59 <mordred> but leave the create_cgit_repos in ... 1 is prolly fine
19:19:10 <mordred> and 2 is prolly fine, because we'll be running per-repo
19:19:31 <mordred> 3 - oy. I got nothing for 3 right now
19:19:32 <jeblair> so that leaves "it breaks for new projects with empty repos".  i assume that happens regardless?
19:19:36 <mordred> yeah
19:19:47 <jeblair> fungi: can you confirm or fail-to-confirm that assertion ^?
19:20:32 <jeblair> guess not
19:20:40 <fungi> jeblair: i have witnessed it when run under the puppet exec (and we have tracebacks)
19:21:10 <fungi> i haven't tried to reproduce that failure mode running outside of puppet since we only witnessed it once we had reenabled automatic puppeting
19:21:14 <jeblair> fungi: what's the solution to manually fix it?
19:22:08 <fungi> jeblair: delete all the on-disk repos and rerun it a couple times, which leads me to believe it could still be a race of some sort in gerrit group or acl processing which we only hit when we don't import a project (that could be affecting the timing?)
19:22:24 <fungi> all the on-disk broken repos for the project in question, that is
19:22:52 <fungi> the traceback is a vague "failure" from gerrit when running the create-projects api call
19:23:01 <fungi> and there's nothing helpful in gerrit's logs that i could find
19:23:22 <jeblair> it sounds like if we wanted to limp along, we can do so by manually running manage-projects (with possibly needing the fix fungi describes)
19:23:27 <mordred> could we just ban completely empty projects for now to avoid 3?
19:23:37 <jeblair> mordred: i would like to avoid that
19:23:47 <mordred> k
19:23:48 <clarkb> fungi: I am not convinced it it related to the group or acl things since, empty repo creation is handled entirely by gerrit
19:24:01 <jeblair> mordred: i personally don't ever want to say "you are required to use github in order to use our system"
19:24:22 <mordred> jeblair: yes. I completely agree
19:24:27 <clarkb> fungi: but I need to do more debugging myself to understand what happens
19:24:36 <fungi> clarkb: agreed. it's a stab in the dark. could be an internal gerrit issue which is fixed in later releases
19:24:38 <jeblair> so if we're going to run it at all, then i think we need to just eat the cost of empty repos ourselves until it's fixed
19:25:34 <jeblair> i'm personally ok with saying "no new stackforge projects until this is fixed".  but the manual process sounds tolerable, and we might be able to learn how to fix problem 3.
19:25:40 <jeblair> so i'm okay with that too.
19:26:19 <jeblair> mordred, clarkb, fungi: since the 4 of us will have to manually run this thing: do you want to continue to accept new projects and run by hand as discussed, or would you prefer a moratorium?
19:26:46 <fungi> i think if we're going to stop accepting new stackforge projects for a while, we should make a fairly broad announcement
19:27:06 <mordred> jeblair: I'm fine with running manually when one of us feels we have time - but we should make it clear to folks that we're probably going to batch them up or something
19:27:08 <jeblair> fungi: i agree, and a clear one that states the problem and when we expect it to be fixed
19:27:33 <jeblair> "new project friday"? :)
19:27:39 <mordred> :)
19:27:40 <clarkb> mordred: batch up ++
19:27:55 <fungi> otherwise for some (unfortunately large number of) contributors this is their first commit and interaction with gerrit/infra, so doing through all that effort only to be told to go away is not going to give them a great impression of our processes
19:27:56 <mordred> oh - you know what?
19:28:08 <mordred> how about we request that people put the topic "new-project"
19:28:13 <mordred> so that we can find new project requests
19:28:19 <jeblair> mordred: ++
19:28:22 <clarkb> mordred: ++
19:28:35 <mordred> that'll make them easy to batch review and batch apply
19:28:36 <fungi> i think that would help
19:28:57 <jeblair> so, batch manual creation weekly?
19:29:02 <fungi> heh... new project fridays
19:29:03 <jeblair> on fridays?
19:29:06 <clarkb> ++
19:29:09 <mordred> I can send a quick email - but we should probably put a note in teh stackforge docs
19:29:12 <mordred> about it
19:29:16 <mordred> (about teh topic)
19:29:23 <fungi> sure. and then i'll go out and drink heavily after i'm done running (and cleaning up after) it
19:29:27 <mordred> ++
19:29:40 <jeblair> #agreed new projects should have the topic "new-project"
19:29:40 <jeblair> #agreed we will batch create new projects weekly on fridays
19:29:54 <jeblair> okay, final thing
19:29:57 <clarkb> fungi: if you get a chance can you direct me to the puppet logs of failed non seeded project creations?
19:30:04 <clarkb> fungi: I will take a look at them to get more eyes on it
19:30:05 <jeblair> who's going to fix this, and when do we expect it to be fixed?
19:30:16 <fungi> clarkb: i think i pasted it into the open bug, but i'll confirm
19:30:41 <jeblair> i'd like to include that in the communication around this as well, becuase this is still a service degredation.
19:30:56 <jeblair> and also, i don't actually want to agree to run them manually if it's not going to be fixed
19:31:14 <fungi> i completely agree with that sentiment
19:31:14 <mordred> jeblair: I'll fix it - I've started poking at the code anyway
19:31:27 <mordred> and I think it's an important thing to have properly automated
19:31:54 <anteaya> mordred: can you drag me along, I would like to learn
19:32:00 <mordred> anteaya: yup
19:32:03 <anteaya> thanks
19:32:12 <jeblair> mordred: ok.  can you estimate when you might have something working enough for us to start trying it in production?
19:32:13 <SergeyLukjanov> ++ for batching new projects requests
19:32:30 <mordred> jeblair: depends on how much hacking I can get done on the plane on friday
19:32:51 <mordred> can I give you an estimate on that next meeting?  I don't think the code split/clean is THAT much work
19:33:17 <SergeyLukjanov> btw, I think that I can help with manually creating projects if someone teach me / grant permissions... if it sounds useful
19:34:23 <jeblair> mordred: yeah, but the salt thing is more complex and vague
19:34:34 <mordred> jeblair: yes. that is true
19:34:39 <jeblair> mordred: part of the problem is that we've been saying 'oh we should have salt run m-p' for years
19:35:07 <anteaya> yes, at least since last year
19:35:09 <jeblair> so i want to have an end date for this
19:35:12 <mordred> jeblair: well, step one is to get the scripts split up. step two is to be able to run _Those_ scripts in the proper sequence by hand to verify that it works as expected
19:35:22 <mordred> step three woudl be to have salt run the sequence of thigns we ran by hand, yeah?
19:35:25 <mordred> jeblair: ++
19:35:28 <mordred> to end date
19:35:48 <mordred> let's call it a drop-dead of mid-march
19:36:05 <mordred> I think it's entirely reasonable to get it done by then
19:36:26 <jeblair> okay, so we should have it working by ttx's ski vacation
19:36:41 <fungi> so that we can create the new ttx-goes-skiiing project
19:36:42 <mordred> wow. I love that we all know when ttx goes on vacation
19:37:07 <jeblair> #agreed new system ready for production-testing by mid-march
19:37:19 <pleia2> hehe
19:37:32 <anteaya> mordred: you do remember I am gone for the second week of march
19:37:50 <mordred> anteaya: nope. I do not remember that. let's try to get it done sooner then :)
19:38:01 <jeblair> okay, so i think we should send out an announcement that says to expect delays in new project creation through the end of march (let's give us time to deal with issues we find with the new system)
19:38:13 <mordred> nod
19:38:15 <anteaya> mordred: okay I will find you
19:38:16 <jeblair> and it should also mention the topic name, and that we will batch create on fridays
19:38:19 <mordred> I can send that
19:38:28 <clarkb> sounds good to me
19:38:41 <jeblair> #action mordred send new-project service degredation annoucment
19:39:11 <jeblair> mordred: thanks!  i'm very excited we have a plan
19:39:26 <jeblair> anything else about this?
19:40:02 <jeblair> #topic Pip 1.5 readiness efforts (fungi, mordred)
19:40:25 <clarkb> I can jump right in if we want to still consider this tied to tox
19:40:34 <fungi> i can't remember if i've done anything new on this. mordred?
19:41:27 <fungi> looks like we've had new and unfortunate activity on clarkb tox pull request
19:41:32 <clarkb> yes
19:41:35 <jeblair> link?
19:41:39 <fungi> #link https://bitbucket.org/hpk42/tox/issue/150/posargs-configerror
19:41:50 <fungi> #link https://bitbucket.org/hpk42/tox/pull-request/85/fix-command-expansion-and-parsing/activity
19:42:06 <fungi> the first was linked from the second (the latter being his pr)
19:42:15 <clarkb> so I hopped over to #pylib yesterday and spoke about this
19:43:01 <clarkb> there are a couple issues at hand as I gain broader understanding of tox. Basically prior to 1.7.0 posargs were special in the command processing and worked the way we want them but all other variables worked as posargs does today
19:43:13 <clarkb> so they made tox more consistent in less flexible and backward incompatible ways
19:43:31 <clarkb> my change goes the complete opposite direction and tries to make them consistent and more flexible in backward incompatible ways
19:43:46 <clarkb> but since my change comes second it is apparently not ok that it is backward incompatible
19:44:14 <clarkb> ronny's suggestion is to do a different type of variable substition instead for what we want
19:44:28 <clarkb> but also would like holger's feedback when holger is back from vacation in a week and a half
19:44:55 <anteaya> what is the fallout if we do a openstack-tox branch?
19:45:04 <jeblair> i think we were going to call it xot
19:45:14 <clarkb> I like botox
19:45:15 <sdague> or just bring back run_tests.sh
19:45:32 <clarkb> upstream is being receptive though
19:45:39 <fungi> intoxicated
19:45:43 <clarkb> I think they understand the problem now and appreciate how 1.7.0 broke them
19:45:48 <clarkb> s/them/us/
19:45:49 <fungi> (that's a bit much to type though)
19:45:58 <sdague> clarkb: will there be a way to be compatible across that boundary
19:46:15 <dhellmann> fungi: the command would be "drink"
19:46:16 <clarkb> sdague: yeah by introducing a new feature that does sane argument parsing
19:46:30 <sdague> clarkb: ok
19:46:31 <fungi> dhellmann: you have a good point
19:46:50 <jeblair> clarkb: remind me why pip 1.5 might be dependent on this?
19:46:50 <mordred> so - quick question -
19:47:05 <mordred> jeblair: we get pip via tox
19:47:10 <clarkb> jeblair: because tox 1.7.0 bundles pip 1.5 so as long as we use tox 1.6.1 we use older pip
19:47:14 <jeblair> gotcha
19:47:17 <fungi> if we want a tox version which includes pip 1.5 or later
19:47:19 <fungi> that
19:48:13 <clarkb> we do have a few potential workaround if we want to use tox 1.7.0
19:48:30 <clarkb> we can stop using python setup.py test and go back to the small shell snippet
19:48:41 <clarkb> we can update python setup.py test to treat posargs as testrargs
19:48:47 <clarkb> basically do what tox does in setup.py
19:49:04 <clarkb> both of these changes will work with tox 1.6.1 and 1.70
19:49:53 <clarkb> I still maintain tox is broken and my arg parser is better >_> but I don't see me winning that battle
19:49:54 <mordred> I'd rather do the second thing - but I think I failed last time I tried doing that
19:49:54 <jeblair> but both of them require updating all branches of all repos
19:50:01 <clarkb> jeblair: correct
19:50:03 <fungi> but will require changes to all projects (or can we shim that with pbr?)
19:50:04 <mordred> because the args string is actually parsed by setuptools and then handed to us
19:50:32 <mordred> clarkb: what gets passed to tox in the broken version?
19:50:44 <clarkb> mordred: an argv with posargs in it
19:50:57 <clarkb> which is then appended to the argv built from the command
19:51:19 <clarkb> so it doesn't get shlex'd in the way you might expect, the literal argv array is appended to the argv apssed to exec
19:51:25 <mordred> yeah
19:51:28 <mordred> gotcha
19:51:31 <clarkb> its a cheap way to ignore escaping rules but isn't very flexible
19:52:13 <jeblair> clarkb, mordred: do you want to make a decision now, or wait a few weeks for holger?
19:52:29 <mordred> I thnk we shoudl wait a few weeks
19:52:30 <clarkb> I am tempted to wait for holger so that we are good citizens
19:52:39 <mordred> I don't want to touch pip during ff period anyway
19:52:45 <jeblair> good point
19:53:07 <jeblair> so we're probably actually looking at what to do for J at this point?
19:53:16 <clarkb> yes
19:53:42 * fungi concurs. slushy from this point onward i think
19:53:53 <jeblair> #agreed wait for holger to weigh in on tox changes; target pip 1.5 for Juno
19:54:01 <ttx> slushy slush
19:54:22 <jeblair> #topic Open discussion
19:54:57 <jeblair> i don't think we have sufficient time for any other major items, so anyone have any smaller items to mention?
19:55:16 <mordred> there is some pypi-mirror refactoring up
19:55:28 <SergeyLukjanov> fyi I've started working on extracting common jobs to templates for layout.yaml, starting from requirements check and pypi release
19:55:30 <fungi> new elasticsearch cluster is on the way, maybe by my dinnertime
19:55:42 <zaro> any ETA on az2 so i can run a working build?
19:55:57 <jeblair> zaro: i haven't heard anything new
19:55:58 <mordred> that YorikSar helped me make better than I was making it - which was all driven by trying to re-think how pypi-mirror deals with errors and stuff
19:56:59 <fungi> also the change to no longer download wheels in pypi-mirror merged yesterday, and we're getting complete mirroring once more
19:57:19 <anteaya> yay
19:57:21 <clarkb> fungi: it may take longer than that :P depends on how long data ends up replicating to new nodes allowing us to shut off old ones (we only have one node of redundancy so can turn everything off all at once)
19:57:29 <fungi> oh, and i've proposed we stop running grenade on havana a week early
19:57:31 <clarkb> fungi: woot
19:58:25 <mordred> fungi: ++
19:58:30 <jeblair> thanks all!
19:58:31 <jeblair> #endmeeting