#openstack-meeting log

19:01:36 <jeblair> #startmeeting ci
19:01:37 <openstack> Meeting started Tue Nov 13 19:01:36 2012 UTC.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:39 <openstack> The meeting name has been set to 'ci'
19:02:08 <jeblair> #topic actions from last meeting
19:02:32 <jeblair> fungi: any movement on foundation server stuff?
19:02:39 <fungi> talked to reed again today
19:03:01 <fungi> he's sitting across a conference table from toddmorey for the next two days, and will try to get some info/movement on it
19:03:37 <fungi> my patches are probably a bit stale, so i'll rebase them again
19:03:43 <jeblair> #action toddmorey provide a test foundation server
19:03:43 <fungi> other than that, nothing
19:04:44 <jeblair> i do not think that mordred updated the bug list
19:04:54 <clarkb> I am pretty sure he hasnt
19:04:57 <jeblair> #action mordred bugify summit actions
19:05:08 <jeblair> #action everyone collect action items from other summit session etherpads and register as bugs
19:05:18 <jeblair> and i confess, i have not done that second thing yet myself.
19:05:45 <jeblair> I _have_ deconfigured nova-volume testing on master...
19:06:01 <jeblair> so the current devstack-gate only runs cinder on master
19:06:03 <clarkb> I did put a thing or two on the state-of-ci list so that mordred would do it :)
19:06:16 <jeblair> and runs cinder+n-vol for folsom, and n-vol for <folsom
19:06:35 <jeblair> i think the mechanics for that will work out well for similar projects, like quantum
19:07:01 <jeblair> #topic grenade / quantum
19:07:41 <jeblair> These haven't been progressing much; and I need to spend some time tracking people down and trying to get them moving again.
19:08:39 <clarkb> what is left for quantum?
19:08:43 <jeblair> dtroyer suggested that grenade may be making some assumptions about where upgrade data are stored that is not compatible with running it in the devstack gate.
19:08:46 <jeblair> as for quantum
19:08:55 <jeblair> there is this change:
19:08:57 <jeblair> #link https://review.openstack.org/#/c/14990/
19:09:10 <jeblair> which I'd like to get some nova-core people looking at...
19:09:53 <jeblair> particularly since it seems to do a lot of wrapping devstack exercises with "if using quantum...; else..."
19:10:27 <clarkb> mordred just walked into a different meeting. fyi...
19:10:55 <jeblair> <sigh>
19:11:44 <jeblair> #topic testr and friends
19:11:55 <jeblair> clarkb: what's up with testr?
19:11:55 <mordred> jeblair: hey man - some of us have to walk in to meetings sometimes
19:12:06 <pabelanger> o/
19:12:26 <mordred> I have patches that get nova all the way to using testr
19:12:27 <clarkb> so I haven'y been able to do much with testr while you guys were conferencing, but we haven't had a meeting over that time period either...
19:12:28 <jeblair> mordred: glad you're here.  you're up next.  :)
19:12:44 <mordred> except - the last few patches make things SLOW
19:12:50 <jeblair> that's sad
19:12:56 <mordred> I have no yet been able to diagnose
19:12:57 <clarkb> I have a patch that basically got testr mostly working so that nova devs could look at it and play with it
19:13:06 <clarkb> I think jog0 was one of the few to really take a look at it.
19:13:13 <jeblair> clarkb, mordred: are these the same or different patches?
19:13:28 <clarkb> jeblair: different, mine was more just get it to go and mordreds is more make it work properly
19:14:05 <clarkb> one comment from jog0 was wondering if we could have nose and testr as options...
19:14:09 <jeblair> no
19:14:27 <clarkb> I kind of figured we didn'y want to support both. However, getting coverage with testr may be tricky
19:14:42 <fungi> run_tests.sh and tox options are already causing enough confusion on what will pass ci testing
19:14:52 <jeblair> the attempt to move from run_tests to tox left us with two ways of running tests.  i really don't want four.
19:14:54 <jeblair> fungi: exactly.
19:14:56 <mordred> I imagine it has something to do with making database init proper fixtures
19:15:21 <clarkb> the problem with testr and coverage is testr runs everything in different processes and relies upon a line protocol so you can't just run it under coverage to get that info
19:16:03 <clarkb> each individual process would need to be told to run under coverage then you will need to merge the results. certainly possible, just something I haven't sorted out yet
19:16:32 <mordred> clarkb: I think we should not care about testr for coverage tests
19:16:41 <jeblair> mordred: how will we run coverage tests then?
19:16:54 <mordred> any of the normal test runners
19:17:04 <mordred> coverage has a wrapper
19:17:05 <jeblair> i thought testr was more "normal" than nose
19:17:08 <mordred> no
19:17:24 <mordred> testr requires that the unittests _themselves_ operate in the usual unittest protocol
19:17:38 <mordred> but it's quite a complex pipeline approach suitable for running tests - but not for doing other things
19:17:57 <jeblair> mordred: then what does it yield us if we still have to run nose?
19:18:25 <mordred> running nose for the coverage tests should be fine - because if that fails weirdly, whatever
19:18:39 <mordred> the tests will have been correctly run via the unittests
19:18:56 <mordred> alternately
19:19:17 <jeblair> mordred: that sounds nice, but i just heard "there are now twice as many ways running tests can break".
19:19:37 <mordred> there's still only one way to run tests - coverage is a post-analysis things
19:19:42 <mordred> but I hear you
19:19:44 <mordred> it's just a thought
19:19:57 <mordred> the other option is going to be patching subunit/testtools to grok the coverage library
19:20:10 <jeblair> mordred: actually, we've been talking about making it pre-merge, so you can factor coverage changes into merge decisions.
19:20:13 <clarkb> I think we can just run subunit/testtools under coverage
19:20:17 <mordred> ok. nevermind then
19:20:21 <mordred> we need to patch subunit
19:20:27 <clarkb> mordred: why?
19:20:31 * mordred poked at this on the plane
19:20:37 <clarkb> :(
19:20:59 <mordred> I could be wrong - I'm just pretty sure
19:21:02 <mordred> please prove me wrong :)
19:21:04 <jeblair> (and running it pre-merge means that coverage run-time affects overall check test run time, btw)
19:21:36 <clarkb> I want to say that you can run subunit under coverage and as long as it doesn't fork you get all thedetails
19:21:37 <mordred> yeah. k. let's make coverage work with testr then
19:21:59 <mordred> we need to be able to work with it in parallel mode - which is why we need patching I think
19:22:23 <clarkb> or have some external way to merge multiple coverage reports
19:22:31 <jeblair> clarkb: you want to continue hacking on that?
19:22:35 <clarkb> but yes, patching subunit/testtools is a possibility.
19:22:37 <clarkb> jeblair: yes
19:22:59 <jeblair> #action clarkb look into subunit/testtools with coverage
19:23:21 <clarkb> mordred: can you #link your nova change for testr?
19:24:31 <mordred> #link https://review.openstack.org/#/c/14949/
19:24:49 <jeblair> shall we move onto project creation?
19:25:02 <clarkb> yes, I think we have covered testr for now
19:25:06 <jeblair> #topic automagic project creation
19:25:11 <jeblair> #link https://review.openstack.org/#/c/15352/
19:25:28 <jeblair> this seems nearly ready to go!
19:25:37 <mordred> yes. just needs docs
19:25:44 <jeblair> didn't clarkb write some?
19:25:51 <clarkb> no, I fixed the technical issues
19:25:53 <jeblair> ah
19:26:05 <clarkb> that said mordred if you are in meetings all day I can crank out docs
19:26:19 <mordred> clarkb: please. my day got bitchslapped
19:26:29 <clarkb> ok, I will do that today
19:27:07 <clarkb> then, once those are written we should probably do another round of testing on review-dev to catch and potential fails in our recent updates
19:27:18 <clarkb> mordred: or have you been testing as you went?
19:27:26 <mordred> clarkb: I have not tested the group-add change
19:27:32 <clarkb> k
19:28:11 <jeblair> #action clarkb document and test project creation change
19:28:28 <jeblair> i'm very excited about merging that and being ready for the expected onslaught of new projects.
19:28:35 <mordred> ++
19:28:35 <clarkb> for those following along at home this change puts gerrit project management in puppet
19:28:59 <clarkb> so that any one can propose new gerrit projects and have them automagically created when the puppet change is approved through gerrit
19:29:52 <jeblair> #topic gerrit user sync script
19:30:18 <jeblair> now that there is an api call in launchpad to look up a user given an openid,
19:30:32 <jeblair> we can have the gerrit sync script automatically correct the situation where a user logs into gerrit with an unexpected openid
19:30:55 <jeblair> it's not a perfect solution to the problem, but it should eliminate the need to ask the lp admins to manually correct the situation.
19:31:13 <jeblair> i've started working on that, and since i have to fully comprehend the sync script in order to implement it...
19:31:23 <jeblair> i'm trying to leave it in a better state than i found it
19:31:50 <mordred> you're ripping it down to the new group sync semantics, yeah?
19:31:58 <jeblair> which means hopefully more modular and maintainable, along with a few technical changes:
19:32:11 <jeblair> yes, one is that it will only sync groups that exist in gerrit
19:32:31 <jeblair> which should cut down on syncing tons of unecessary groups and perhaps thousands of users.
19:32:55 <fungi> potential major runtime improvement there
19:33:08 <jeblair> another is to cache all the LP data at the start of the script, and move the actual database writes to the end, so that the time spent holding write locks in mysql is much smaller
19:33:53 <jeblair> so we should be able to actually use gerrit group admin functions again, which we pretty much can't because the script is always holding a write lock on the groups tables.
19:34:13 <fungi> in which case we might not have to worry so much about turning off the sync script during maintenance involving gerrit db changes too
19:34:17 <mordred> excellent - once that's in - I want to delete the useless groups too
19:34:22 <jeblair> mordred: +1
19:34:48 <jeblair> i should have something for review soon, but i'm also going to try to make one more improvement:
19:34:50 <clarkb> ++
19:35:21 <jeblair> a debug mode that caches the LP data in a pickle for re-use across runs so that we can actually test and debug the script in human rather than geological time.
19:35:51 <mordred> :) funny story - I had something similar to that in the VERY FIRST versoin of the script
19:35:53 <mordred> oops
19:36:02 <jeblair> #action jeblair finish updates to sync script.
19:36:21 <jeblair> mordred: yeah, it's pretty important.  this poor script has seen a lot of action.  :(
19:37:14 <jeblair> #topic ci-issues-log
19:37:31 <jeblair> clarkb: want to talk about your idea?
19:37:34 <clarkb> #link https://etherpad.openstack.org/ci-issues-log
19:38:05 <clarkb> at the summit there was a lot of mention about when the infrastructure failed and when things couldn't merge and so on
19:38:24 <clarkb> and we weren't tracking these issues very well
19:38:27 <fungi> assertions about perceived gate "instability"
19:38:58 <clarkb> now, these things don't always end up being bugs in hte infrastructure or even things related to what we do, but the perception is there
19:39:12 <jeblair> perceived is a good word, because at this point the infrastructure very rarely fails.
19:39:22 <clarkb> so filing bugs against openstack-ci for things we can never fix or don't have a hand in doesn't make sense
19:39:34 <clarkb> but we still want to track this so I started the above etherpad
19:39:58 <jeblair> it's useful for that, but i think it's actually more useful as a communication tool for ourselves...
19:40:11 <clarkb> basically when something fails jot it down in there
19:40:17 <clarkb> jeblair: yes, it has been useful for that
19:40:33 <clarkb> being able to keep up to date with the latest status of a particular issue is helpful
19:40:47 <jeblair> i find it's valuable to see what has been happening and what other people have been doing, for exactly the reason a ships log is useful to crews going on and off shift.
19:41:54 <jeblair> but it can also be a tool for exposing what's going on (and what's going wrong) to the wider community...
19:42:23 <jeblair> but whether etherpad is the best tool for that is an open question
19:42:32 <clarkb> ya, I am still not sold on it
19:42:40 <jeblair> it's great that we can all edit it and keep things up to date
19:43:03 <fungi> at my last job, we used a private wordpress instance for that, but it wasn't really ideal either
19:43:26 <fungi> i like the etherpad better in that the content is more granularly collaborative and wikilike
19:43:37 <clarkb> we could potentially use a git repository
19:43:44 <clarkb> to have stronger versioning and history
19:44:01 <fungi> or a wiki page...
19:44:03 <jeblair> we could try publicising it and see what happens; i guess my only concern is that misinformation or less-useful information, or "problem dumps" start showing up there.
19:44:05 <clarkb> or a wiki page
19:44:27 <clarkb> jeblair: ya, I don't really see it as a user reporting tool
19:44:34 <clarkb> the info there should be pre filtered
19:44:35 <fungi> etherpad and wiki are both not natively great for keeping long and continuously update logs of things though, i think
19:45:00 <clarkb> so that it isn't ambiguous to the next shift if things have been filtered
19:45:22 <fungi> i think i want something bloglike over the long term but wikilike over the short term
19:45:34 <fungi> not really sure such a thing exists
19:45:57 <clarkb> we could use a static content blog system backed by git
19:46:13 <jeblair> clarkb: i think quick updates are key
19:46:40 <jeblair> i really hate heavyweight reporting tools.
19:46:41 <fungi> i agree. and at that, etherpad is great. normal wikis somewhat but not quite as much. blogs and git far less so
19:47:12 <jeblair> clarkb: that doesn't exclude your idea, but i think it suggests that maybe it should be wrapped with quick scripts or something.
19:47:31 <fungi> maybe something that scraped a daily etherpad into a git-backed blog entry?
19:48:01 <jeblair> we could also write a web app that's half wiki and half blog.  click to edit the most recent entries, automatic archiving of old ones...
19:48:16 <jeblair> okay, so more brainstorming about this, but it seems several of us really like the idea.  :)
19:48:24 <fungi> yes
19:48:38 <jeblair> on a related note, flakey tests...
19:48:42 <jeblair> #topic flakey tests
19:49:12 <jeblair> there have been a lot of flakey tests lately, obviously, and the issues log is at least partly a response to that
19:49:17 <clarkb> yes
19:49:19 <jeblair> we've been sort of a de-facto clearinghouse for information about the tests
19:49:39 <fungi> or front desk for complaints about anyway
19:49:55 <jeblair> which is a useful thing to do, but i think it's distracting us from doing the things we're rather better at than being a help desk.
19:50:06 <mordred> ++
19:50:23 <mordred> jaypipes: you around? this might be a convo you should be in on...
19:50:36 <clarkb> my initial instinct is to take away reverify
19:50:47 <mordred> same here
19:50:54 <clarkb> you can recheck to see if your patch is actually bad
19:50:55 <mordred> although it will cause an immediate revolt
19:51:07 <clarkb> but to merge your code you must take some ownership of the failrues
19:51:18 <clarkb> and the core members can re authorize if need be
19:52:00 <jeblair> yes, removing recheck/reverify doesn't stop you from merging code, but it escalates problems.
19:52:25 <jeblair> and given how solid the infrastructure is, i feel comfortable doing it from that point of view... however...
19:52:33 <jeblair> it seems like either the code or tests or both are kind of crap right now.
19:52:35 <clarkb> that said I think the flakeyness is pretty visible and that hasn't helped the troubleshooting much
19:53:11 <jeblair> and it will really annoy people that their changes are harder to get merged (even if it's the fault of their co-devs)
19:53:22 <jeblair> so, how about this for a compromise:
19:53:25 <torgomatic> but if the flaky code is in another project, then that makes developers' lives harder
19:53:43 <jeblair> torgomatic: indeed, exacerbating that point.
19:53:47 <fungi> there is only one project, and that project is openstack
19:54:00 <torgomatic> for example, if the devstack gate fails due to some Cinder thing when run on a commit in Swift, there's about a 0.0% chance that I (as a Swift dev) can go fix it
19:54:01 <jeblair> fungi: that's right too.  :)
19:54:18 <jeblair> anyway, idea: recheck/reverify require a bug link.
19:54:28 <clarkb> torgomatic: correct, which is why the core members being able to re authorize is important.
19:54:39 <fungi> maybe you don't fix it, but you involve devs for the component which is suspect
19:54:42 <clarkb> torgomatic: but, in doing so those core members should be working with the other projects to sort out the problems
19:54:50 <jeblair> so you have to at least diagnose/triage the problem enough to identify an existing bug in the correct project, or report a new one.
19:54:53 <clarkb> (this is my thought of how things would work in an ideal world)
19:55:44 <jeblair> and we whip up a report of the most active/recent bug links attached to reverify/rechecks
19:56:06 <fungi> it's also an incentive to step up scrutiny of stability for new openstack components during incubation, since everyone becomes responsible for it being smooth once we gate on it
19:56:09 <jeblair> so that they can be quantified, tracked in the project meeting, and hopefully more dev attention focused on them.
19:56:34 <clarkb> I really like that
19:57:03 <clarkb> may be less useful for rechecks as it could be the patch itself that is broken
19:57:18 <clarkb> but being able to track and quantify is a giant step above where we are now
19:57:27 <fungi> yeah, tying every failure to a documented bug report (even a vague one), would be great
19:57:33 <torgomatic> it can be difficult to know which codebase something else is in, though
19:58:14 <clarkb> torgomatic: we can move bugs around projects
19:58:16 <jeblair> yeah, but if the volumes test fails, you can at least start with a bug against cinder, and if that's not right, it can be moved to the right project on later inspection
19:58:28 <torgomatic> clarkb: fair enough
19:58:36 <clarkb> I think if a bug is submitted with general failure details then as part of the troubleshooting that info can become more solid
19:59:15 <mordred> (lurking - but can we suggest that the bug gets a flaky-ci tag or something so that they can be raised in the weekly meetings?)
19:59:27 <jeblair> #action jeblair propose a system for linking reverifies to bugs
19:59:28 <jeblair> mordred: +1
19:59:40 <jeblair> and we're out of time
19:59:46 <jeblair> thanks everyone!
19:59:49 <jeblair> #endmeeting