Wednesday, 2013-08-21

mgagnehow are repositories named on the filesystem? with or without .git suffix?00:00
clarkbmgagne: with00:00
clarkbjeblair: pleia2 I can fetch that neutron ref now00:02
clarkbshould I go ahead and add symlinks for all the things?00:02
clarkbor should we focus on actual solution now that we know that is sufficient00:02
*** datsun180b has quit IRC00:02
*** reed has quit IRC00:03
jeblairclarkb: i'd add the symlinks for the existing projects00:04
mgagneWho is therefore returning an URL without .git in it?00:04
jeblairmgagne: exactly, that's the question; is it a difference in the git version?00:05
mgagnejeblair: could it be the git client?00:05
clarkbwell our test scripts are hard coded to use paths without .git00:05
jeblairclarkb: yes, but our scripts don't fetch packfiles, git does00:06
clarkbso either something is adding it when we talk to review.o.o or our rewrite and aliasmatch stuff is munging it or git version makes a difference00:06
jeblairclarkb: so either the git client or git server is doing something unexpected00:06
jeblairclarkb: remember, almost no pack files were retrieved from review.o.o00:06
jeblairclarkb: i am not certain this is a difference00:06
mgagneif it's related to the git client, I believe that both should be supported (w/ and w/o .git) to avoid frustration and issues with the enduser/devs00:07
jeblair2013-08-20 23:57:07.094 | error: Failed connect to git.openstack.org:443; Connection refused while accessing https://git.openstack.org/openstack/tempest/info/refs00:08
mgagneshould it be handled by Apache or by the filesystem, I don't know what's the best.00:08
fungiyeah, pretty sure no amount of filesystem or cgi adjustments are going to solve a connection refusal from apache00:12
jeblairfungi: different problem00:12
fungigranted00:16
fungibut suspect we could be hitting overall connection limits too00:17
jeblairfungi: yep00:18
fungigit.o.o is acting a lot more hammered than review.o.o was, even though the load average isn't near as high00:19
SpamapSdo we run devstack on a py26 system in the gate?00:19
SpamapSor just unit tests?00:19
fungiSpamapS: just unit tests00:19
SpamapSI think python-novaclient may be uninstallable in py2600:19
SpamapSAttributeError: 'module' object has no attribute '__getstate__'00:19
*** dkliban has quit IRC00:19
SpamapS      File "/usr/local/lib/python2.6/dist-packages/setuptools/sandbox.py", line 58, in run_setup00:21
SpamapShm thats actually in ye-olde distribute00:21
jeblairclarkb, pleia2: think the pack thing may be a tiny bit of a red herring00:21
jeblairmgagne: ^00:21
mordredjeblair: oh piddle :)00:21
fungiping rtt to git.o.o is averaging 1600ms for me right now, as opposed to review.o.o which is around 55ms00:21
jeblairclarkb, pleia2: i _suspect_ that those files are only retrieved directly by the _dumb_ http client00:21
SpamapSahh have to remove python-pkg-resources00:21
clarkbjeblair: interesting00:22
jeblairthat job fell back on the dumb client because:00:22
jeblair2013-08-20 22:50:00.379 | error: The requested URL returned error: 504 while accessing https://git.openstack.org/openstack/neutron/info/refs00:22
pleia2ah00:22
jeblairit thought the smart client wasn't available00:22
clarkbjeblair: so our rewrites are not working properly00:22
jeblairclarkb: correct, they're just plain wrong but pretty much never used (my hypothesis)00:23
clarkbalso writing a script to make these symlinks that is idempotent and not insane is taking too much time00:23
jeblairclarkb: i would consider abandoning that and deleting the symlinks at this point00:23
mordredSpamapS: you need to pip install -U pip before installing anything via pip currently00:23
mordredSpamapS: if you want to be safe00:23
jeblairand add a medium priority todo fix the rewrites00:23
SpamapSmordred: did that, had to apt-get remove python-pkg-resources00:24
mordredSpamapS: you will pip install -U setuptools00:24
mordredSpamapS: that mainly means that something borked something first00:24
SpamapSmordred: and apt-get remove python-setuptools00:24
mordredthat should not be necessary00:24
mordredbut if something pip installed something first00:24
mordredyou'll need to do that to recover00:24
SpamapSfirst two things I did were exactly that, pip install -U pip, and then pip install -U setuptools00:24
jeblairfungi: yeah. my interactive shell is very slow too.00:24
mordredSpamapS: wow. really?00:24
mordredsigh00:24
mordredSpamapS: this is on precise?00:24
mordredSpamapS: or?00:25
SpamapSyeah.. had to apt-get remove setuptools and then re-do pip install -U setuptools to recover :-/00:25
SpamapSmordred: lucid00:25
mordredoh. jeez00:25
SpamapSmordred: tryign to test py2600:25
mordredsorry. I have done zero testing of lucid00:25
fungiand yet load average on git.o.o is in the single digits, not >200 like we saw on review.o.o00:25
mordredgod only knows how broken it is00:25
clarkbjeblair: ok00:25
mordredSpamapS: we have workarounds for that in devstack, which involve wget-ing things00:25
clarkbjeblair: in the mean time a bunch of jobs will fail for random things00:26
mordredSpamapS: the situation is pretty messed up00:26
clarkbjeblair: do we maybe want to point everything at /cgit for now?00:26
jeblairclarkb: no, we need to make git.o.o responsive00:26
jeblairmoving it around to a different unresponsive thing isn't going to make us happy00:26
jeblairclarkb: that timeout could have happened just as easily talking to cgit00:27
mgagnewhat is git.kernel.org using to server requests over http?00:28
*** jog0-away is now known as jog000:28
jeblairso how about we go ahead and load balance it, even though we don't have a good config, and we can come back and make it sane later?00:28
jeblairstart throwing hardware at the problem00:28
mordredjeblair: ++00:28
pleia2we're close to a good config00:28
pleia2at least, to limping00:29
fungii thought we had git.o.o in cacti, but i guess not00:29
pleia2https://review.openstack.org/43012 switches us over to service git:// then we bring in clarkb's haproxy patch https://review.openstack.org/#/c/42784/ (will need some edits after my patch)00:29
jeblairpleia2: i meant a config that is correctly tuned for the tradeoffs we have chosen (based on those things we talked about in the meeting)00:29
pleia2jeblair: oh, right00:30
*** nati_ueno has quit IRC00:30
jeblairpleia2: but those are definitely steps in the right direction00:30
clarkbjeblair: so I agree that the pack thing isn't the only issue, but until we fix that redirect or have the symlinks every single one of those fetches will fail00:30
jeblairclarkb: but they should never happen unless there has already been an error00:30
jeblairclarkb: i'm trying to say we've already lost by the time that fetch happens, we need to make sure it never happens00:31
clarkbis that what that means? I was clearly focusing way too hard on symlinks of all things00:31
jeblairclarkb: yeah, that's what i was trying to say earlier -- the smart http client should never fetch those00:32
jeblairclarkb: it only did it because it thought the smart http server wasn't available00:32
jeblairclarkb: supporting those urls only means that if the smart http client fails, our jobs will suck even _more_ data from git.o.o using the dumb client00:32
clarkbyeah00:32
clarkbif we are going to go mulitnode, I wonder if it is worth investigating using precise for the git-http-backend serving00:33
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Start trending git.o.o performance with cacti  https://review.openstack.org/4302300:33
clarkbsince as fungi pointed out git.o.o is feeling the load a lot more than review.o.o00:33
* SpamapS considers snapshotting this lucid box for the next time a python2.6 thing is needed.. so much pip.. so little time00:34
clarkbthen serve /cgit from centos and everything else from ubuntu nodes00:34
mordredjeblair: what about tuning apache to do a much smaller number of active connections and tcp backlog most of them?00:34
* clarkb goes back to cleaning up symlinks00:34
mordredjeblair: to prevent the overloaded/timeout situation?00:34
jeblairthe current situation that fungi pointed out is weird.00:34
jeblairthe load average is low, cpi is mostly idle00:35
jeblaircpu00:35
jeblairi'm wondering if it's hypervisor host system load00:35
fungisome other sort of starvation here. perhaps interrupt handling?00:35
jeblairor network bottleneck00:35
fungior, yes, something on the host compute node maybe00:35
SpamapSI've only ever seen phantom hypervisor load on xen00:35
mordredhypervisor host system load sounds like an interesting cause for timeouts in taht situation00:35
SpamapSkvm has served me well and reported it as "stolen" CPU00:36
jeblairSpamapS: i meant other vms starving the hardware00:36
SpamapSYeah, that should be the same thing.00:36
SpamapS"CPU time that I should have gotten went somewhere else"00:36
SpamapSthat is supposed to be "steal%"00:36
jeblairSpamapS: ah, well, it says its low, 0.0-0.2; are you saying it's unreliable under xen?00:37
SpamapSI have seen it be totally unreliable in xen00:38
SpamapSMost famously in the phantom load seen on Ubuntu 10.04 hosts on ec200:38
jeblairmordred: i think that tuning strategy would be good if we new how many git operations we could handle at once00:38
SpamapS(which has mostly cleared up as they've upgraded their xen)00:38
mordredjeblair: indeed. I was going to suggest starting with number of 'cores'00:39
mordredjeblair: but that might take too long to chase00:39
jeblairclarkb: i like your idea of serving git and cgit separately... because i think we may want to tune them separately00:39
mordredjeblair: if we're going to do that ^^00:39
jeblairmordred: that's reasonable with lbaas?00:39
mordredjeblair: I guess? OR - how about for now we just spin up haproxy so that we can actually control it00:39
fungiin the past when we've been dos'd by our neighbors chewing up resources on the compute node, we've usually observed significant packet loss. in this case we're only seeing very high rtt, which suggests the kernel is being slow about processing the packets00:39
mordredjeblair: and later we can engineer it into lbaas00:40
mordredjust so that our learning curve is lower00:40
* mordred assumes the people in this room can probably tweak an haproxy machine pretty quickly00:40
jeblairoh, well, i guess we're talking about two https services, so that's trickier00:40
mordredoh yeah. good point00:41
clarkbsymlinks all cleaned up00:41
mordredjeblair: three-tier?00:41
fungiyou'd have to terminate ssl/tls on the haproxy and then do stream munging/rewriting on the plaintext http stream, which would get ugly00:41
jeblairi think the thing we can do quickly is spin up more copies of what we have00:42
mordredjeblair: haproxy in front of a couple of apache nodes with mod_proxy that do termination that proxy to different git serving machines?00:42
mordredjeblair: but yes. I think that's the quickest direct route to try first00:42
jeblairso maybe we ought to do that, stick something (haproxy or lbaas) in front of it00:42
mordredand we can furthre optimize by splitting in fancy ways later00:42
openstackgerritA change was merged to openstack-infra/config: Start trending git.o.o performance with cacti  https://review.openstack.org/4302300:42
jeblairand then come back for another pass... yeah ^00:42
mordredjeblair: also - is it worth spinning up a copy on centos, and one on precise just to see if the backends perform differently? or too much work due to how our cgit module is written?00:43
jeblairmordred: we have to figure out how to install cgit on precise 1st00:43
pleia2mordred: there is no cgit package for ubuntu (which is why we went with centos)00:43
mordredjeblair: yeah. good point. later00:43
jeblairmordred: it's _definitely_ worth it, but later, i think.00:43
mordredjeblair: last stupid question from me - given the xen theory from earlier - is it worth trying to spin up a centos node at hpcloud to see if kvm gives us more love?00:44
jeblairokay, so working plan so far: spin up git01 and git01.o.o, and front them with (haproxy on git.o.o) or (lbaas) ?00:44
mordredthese boxes don't need email really00:44
mordred(for now)00:44
jeblairmordred: i don't think it's a xen problem as much as a bad neighbor problem00:44
mordrednod00:44
pleia2mordred: well, might be worthwhile just so we have one on rackspace and one on hpcloud00:44
*** ^d has joined #openstack-infra00:44
*** ^d has joined #openstack-infra00:44
jeblairmordred: i hear hpcloud has a particularly bad tenant.  i'd hate bo be stuck near us.00:44
mordredjeblair: hahaha00:45
clarkbjeblair: hahahahah00:45
pleia2hah00:45
mordredjeblair: working plan sounds good00:45
fungiwe are the bad neighbor00:45
funginice00:45
mordredwe need to replicate to both of them from gerrit, yeah?00:45
*** woodspa has joined #openstack-infra00:45
mordredso we're going to have ot bounce gerrit00:45
clarkblike a bad neighbor openstack infra is there00:45
clarkbthe jingle doesn't quite work but I laughed inside00:45
jeblairmordred: i'm still worried about hpcloud deleting nodes.  you got an email that they deleted one the other day, right?  let's put git03 in hpcloud if we want to try that.00:45
jeblairmordred: yep00:45
jeblairreplication00:45
mordredjeblair: I am too - but if we're actually going to elastic throwaway nodes00:46
jeblairso which do we want to do, our own haproxy on git, or lbaas?00:46
openstackgerritJoshua Hesketh proposed a change to openstack-infra/zuul: Move gerrit specific result actions under reporter  https://review.openstack.org/4264400:46
openstackgerritJoshua Hesketh proposed a change to openstack-infra/zuul: Add support for emailing results via SMTP  https://review.openstack.org/4264500:46
openstackgerritJoshua Hesketh proposed a change to openstack-infra/zuul: Separate reporters from triggers  https://review.openstack.org/4264300:46
* Shrews sees lots of familiar words being thrown around.00:46
mordredjeblair: I think haproxy on git is the path of least resistnace right now00:46
mordredalthough I think if it helps, we sohuld definitely re-work to use lbaas00:47
clarkb++00:47
fungirework to use Shrews00:47
mordredjeblair: if we do lbaas right now, we'll have to do a dns swap and whatnot00:47
clarkbfungi: make Shrews do it FTFY00:47
jeblairk, one more question -- should we spin up 30g nodes, or try shriking them a bit?00:47
* Shrews doesn't work. Try again.00:47
jeblair(i lean toward sticking with 30g until we benchmark)00:47
mordredyes. I agree00:47
mordred30g00:47
clarkbjeblair: ya, and we can go small easily once the lb is happy00:48
mordredhow much would it kill the cloud for us to snapshot git.o.o and then spin up git1 and git2 using that?00:48
jeblairwe are using almost none of the memory, but we don't really understand the cpu or network requirements yet00:48
mordred(so that we don't have to do initial clones right now)00:48
fungithe plan seems sound00:48
jeblairmordred: faster to spin up from scratch; gerrit is lightly loaded, the push won't be bad.00:48
mordredok00:48
clarkbnow that we have a plan. Will everyone hate me if I duck out to bother fungi while he is on this side of the continent?00:49
jeblairclarkb: can you stick around for a sec?00:49
clarkbsure00:50
jeblairin order to get there, we need some puppet work....00:50
clarkbah00:50
jeblairwe need git\d+.o.o defined to be a cgit/git server00:50
SpamapSdumb-but-performant-lbaas-->two modest layer 7 routing boxes-->appropriate target pools is not a terrible meme.00:50
SpamapSif lbaas does ssl, win, otherwise let the layer 7 boxes do it.00:50
jeblairSpamapS: yeah, we may come back and do l7 in the next pass00:50
jeblairand we need git.o.o defined to be an haproxy server pointing to them00:51
SpamapSOh I thought you were scaling different urls differently and that was why this was complicated?00:51
SpamapSalso.. has git->swift come up?00:51
jeblairSpamapS: that was the idea, but we're punting because it's complicated00:51
jeblairSpamapS: you aren't helping00:51
jeblair:)00:51
SpamapS:)00:51
*** dkliban has joined #openstack-infra00:52
jeblairdoes that puppet description make sense?00:52
jeblairand do we want the service and haproxy changes on each of the worker nodes as well?00:52
fungiso just splatter https connections round-robin to the pool members?00:52
mgagneCould it be apache not being the right tool for such use case? And I don't believe an out-of-the-box apache config is appropriate for such setup.00:53
jeblairfungi: i think that's the idea, or whatever haproxy does (maybe it counts sockets?)00:53
mgagneI could be mistaken00:53
jeblairmgagne: we know it is not correct, someone needs to actually benchmark it and get good numbers00:53
jeblairmgagne: and we're planning on using haproxy to make the git server behave better too00:53
clarkbjeblair: as long as the node def allows us to have nodes with digits that don't haproxy and the one without digits to haproxy we should be good00:54
*** lbragstad has joined #openstack-infra00:54
fungiyeah, seems like two node defs to me00:54
clarkbfungi: yurp00:54
clarkbjeblair: should I start hacking something up?00:55
clarkbor are you ahead of me and looking for reviews?00:55
jeblairclarkb: no, i think we're at the point of 'looking for volunteers'00:55
mgagnejeblair: I do understand the benefit of haproxy. I would however reduce keepalive timeout of apache and increase MaxClients if the server can handle it. Serving static files shouldn't put a server on its knees like that.00:56
clarkbjeblair: ok I can start writing the change00:56
jswarrenLooks like there are problems with grenade.00:56
mordredmgagne: well, right now we don't know what the server can handle00:56
mgagnejeblair: but it's only blind suggestions as I don't have much info of what's really going on on the server00:56
fungimgagne: well, a lot of this is cgi backend, not flat file serving00:57
jeblairmgagne: hopefully we'll have performance monitoring soon00:57
clarkbjeblair: canyou or someone else diagram what they want it to look like as we have talked about a bunch of different layouts and I am not 100% sure of what we settled on00:57
mordredmgagne: and it's actually not about serving static files - it's the not-static that are a problem00:57
jeblairfungi: what's your schedule like, are you working at all this evening?00:57
mordredjswarren: we're working through some things. I do not know if that's related00:57
jeblair(i'm getting pretty close to burnout point again myself, so will probably have to pick up tomorrow)00:57
mordredjswarren: do you have a link00:57
*** ryanpetrello has joined #openstack-infra00:57
fungijeblair: i can come back and work after dinner. christine is about to bite my head off if i don't take her out to dinner and sight seeing. she's getting bored of sitting in the hotel room00:57
jswarrenhttp://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/console.html00:58
jswarrenSeen a couple like that.00:58
mgagnefungi: which CGI processes ? cgit or git-http-backend?00:58
jeblairfungi: no pressure00:58
fungimgagne: git-http-backend00:58
mordredjswarren: yes00:58
mordredhttp://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/logs/devstack-gate-setup-workspace-new.txt00:58
fungimgagne: more specifically, hundreds of git-upload-pack00:58
fungii think00:58
mordredoh. wait00:59
mgagnefungi: could it be URL without .git being served by git-http-backend instead of hitting the filesystem?00:59
mordredjeblair:00:59
*** zul has joined #openstack-infra00:59
mordredfatal: Couldn't find remote ref refs/zuul/master/Z10fb39f7b5984e1283445238278973f500:59
mordredUnexpected end of command stream00:59
mordredjeblair: is zuul also having problems? or is that a consequence of git.o.o having issues?00:59
jeblairclarkb, fungi: https://etherpad.openstack.org/git-lb00:59
fungimgagne: it could be just about anything right now. with the server misbehaving potentially causing fallback behaviors in the clients it's tough to know what the real problem is and what the secondary effects are01:00
jeblairdiagram ^01:00
jeblairmordred: was that an error?01:00
mordredjeblair: yes. in http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/logs/devstack-gate-setup-workspace-new.txt01:00
jeblairmordred: i think that's a perfectly normal error01:00
clarkbjeblair: where does ssl terminate in that diagram?01:00
mordredok.01:00
mordredgood!01:00
jeblairclarkb: my understanding of haproxy is that it proxies tcp connections,01:01
jeblairclarkb: so i think ssl terminates at the workers01:01
fungiclarkb: my assumption is ssl terminates on the individual nodes and we do at best layer 4 redirection01:01
mordredyes to ^^01:01
mordredso git1 and git2 apache should each think that they are git.o.o01:01
clarkbjeblair: fungi ok. I think we can have it terminate in haproxy but that makes it morecomplicated /me goes with terminating on the individual nodes01:01
mordredwhich means that the apache module is likely going to need to change, or the puppet01:01
Shrewshaproxy does ssl pass thru, but the dev version is supposed to support ssl termination01:01
Shrewsjust fyi01:01
jeblairsounds like that's the right choice for now then. :)01:02
mordredotherwise it's going to spin up apache on git1 as git1.o.o and the vhost info will be wrong - unless I'm wrong?01:02
jeblairmordred: i believe that just means "there is no zuul ref for this project"01:02
mgagnejeblair: haproxy should support both mode01:03
fungimordred: the apache module is perfectly capable of serving sites with different names than the node's name01:03
mordredjeblair: great! I was worried that there was all of a sudden another issue01:03
mordredjeblair: quick stupid suggestion - what if we stopped doing the git remote update01:03
fungimordred: unless i misunderstood the question01:04
mordredjeblair: and intead just let it do the git fetch from zuul?01:04
mordredwhich is a much more specific request for information01:04
jeblairmordred: increase the load on zuul?  rather not.  :)01:04
mordredjeblair: well, I'm just saying - it's already doing the fetch from zuul, and we're already starting from repos that are pretty close01:05
jeblairmordred: i'm not sure we'd end up with tags, etc...01:05
mordredah. k. there it is. tags for sure01:05
jeblairmordred: whatever it gets from git.o.o now it would have to get from zuul01:05
mordredyeah. k. let's call it another thing to think about later when we have more time to think01:05
jeblairmordred: yes; that would need some testing.01:06
mordredas in - rethink the flow of the states of the refs in the repos and see if we can avoid the blanket 'git remote upate'01:06
*** weshay has joined #openstack-infra01:06
* mordred stops brainstorming01:06
*** rfolco has quit IRC01:06
mgagneare exported resources supported by puppet on infra?01:06
mgagneit requires storeconfigs01:07
mordredmgagne: we've never used them01:07
mordredmgagne: all of our stuff currently works via puppet apply as well as puppet agent01:07
mgagnemordred: we will forget it for now I guess =)01:07
clarkbmgagne: they are not01:08
clarkbexported resources are kind of annoying to work with iirc.01:08
clarkbbecause you need mutliple passes01:08
mordredonce we get to that level of complexity, I think we'll be happier with heat driving puppet and handing it the needed metadata01:08
mgagneclarkb: true when bootstraping an infra, order becomes important01:09
SpamapShm, is there a recheck bug for https://review.openstack.org/#/c/42995/ .. looks like just timeouts during git clones or something01:09
SpamapS(is that what is being discussed right here right now?;)01:09
jeblairclarkb, mordred: i can't run 'nova list' for any of the hpcloud azs01:09
mordredSpamapS: yes01:10
mordredjeblair: are you getting the 400 error?01:10
jeblairyes01:10
mordredthat's what I was getting ealier01:10
*** anteaya has quit IRC01:11
mordredjeblair: I'm asking hp people01:11
mordred"We have a couple of P1 incidents still ongoing.  We're on it."01:12
mordredman, when it rains, it pours01:12
jeblairmordred: ok, thanks.  nodpool isn't going to be able to delete all those nodes until that's fixed.01:12
mordredossum01:12
jeblairmordred: which means it is constrained in what it can spin up01:12
fungii get a list out of openstackjenkins2-project1 on az-1.region-a.geo-101:13
fungialso out of az2 and az301:14
jeblairweird, i do not.01:14
jeblairthis is as root on ci-puppetmaster01:15
fungibe sure you're sourcing the openstackjenkins2 creds and not the old openstackjenkins creds?01:15
jeblairfungi: yep; i'm in a terminal i've been using for days now01:15
jeblair(screen session)01:15
*** mriedem has quit IRC01:16
fungihuh, yep01:16
fungii get the 400 from the puppet master01:16
*** tian has joined #openstack-infra01:17
mordredfungi: what's the network range of the machien you do not get 400 from01:17
mordred?01:17
mordredand what's the puppetmaster IP?01:17
fungimordred: working from 66.26.81.51 and failing from198.101.208.20401:17
mordredfungi: stellar01:18
fungioh, though on the working system i left out some of the params we define on the puppetmaster one. let me see if it's one of those01:19
mgagneuntested haproxy puppet manifest: http://paste.openstack.org/show/44704/01:19
jeblaira bunch of jobs are stuck trying to clone or update from earlier (about 1.25 hours ago)01:19
jeblairi'm aborting them01:19
funginope, using precisely the same creds on the puppetmaster as on my working system, i get a 400 error01:20
fungihow do you get novaclient to tell you what version it is?01:21
fungimy working system is running 2.14.1 from a virtualenv01:21
fungithe puppet master is running $something_older i guess01:21
mordredfungi: it works on my laptop using those creds01:22
fungiso might be something specific to the way the api calls are being made by newer vs older novaclient01:22
clarkbI will have a first draft of a change shortly01:22
mordredwait. those were old cred. lemme try new ones01:22
mordredyes. openstackjenkins2 works with the creds from puppetmaster on my laptop01:23
mordredmordred@camelot:~/src/openstack-infra/gear$ nova --version01:24
mordred2.13.0.10801:24
*** ryanpetrello has quit IRC01:24
jeblairi think the devstack jobs are cloning01:24
jeblairrepos01:24
fungirather than using the cached copies?01:24
mordredjeblair: full clones?01:24
jeblairi think so01:25
fungithat would explain the sudden explosion in git load01:25
mordredwow. well, that would explain the amount of traffic01:25
jeblairi will work on that after dinner, and build new images if needed.01:26
mordredjeblair: I'll look at that too01:28
mordredjeblair: btw - nova on ci-puppetmaster is 2012.101:29
mordredso _very_ old01:29
mordredand if was working earlier01:29
mordredthen got flaky01:29
mordrednow is dead01:29
mordredso I'm asking if they did any deploys today01:29
mordredbecause they may have broken compat with 2012.1 novaclient01:29
*** gyee has quit IRC01:30
fungithat would be so awesome^Wunfortunate01:30
jeblairmordred: ERROR: HTTPSConnectionPool(host='region-a.geo-1.identity.hpcloudsvc.com', port=35357): Max retries exceeded with url: /v2.0/tokens (Caused by <class 'socket.gaierror'>: [Errno -3] Temporary failure in name resolution)01:31
jeblairmordred: i ran that with a newer novaclient on the same system01:31
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278401:31
fungihuh, that's freaky01:32
mordredjeblair, fungi: host region-a.geo-1.identity.hpcloudsvc.com worked, but took a _while_01:32
jeblairmordred: yeah, i got a partial timeout trying that too01:33
* fungi ducks out to dinner but will check back in later01:33
clarkbassuming 42784 doesn't have any syntax errors or tyops I actually expect that to work01:33
clarkbit will only load balance across a single node of localhost right now01:33
*** zul has quit IRC01:36
*** jjmb has joined #openstack-infra01:38
clarkbmordred: whatever happened to IAD?01:39
mordredclarkb: I don't believe we did anything with it yet01:39
mordredclarkb: I mean, I'm not sure that patch even landed01:39
jeblairmordred: it did not land01:39
jeblairi'm not running puppet on nodepool because it's too touch and go01:40
mordred++01:40
mordredjeblair, clarkb: troy thinks IAD may be faster - worth spinning up git1 and git2 in IAD? (also,probably less neighbors right now)01:40
jeblairmordred: we'd be pushing updates across data centers01:41
mordredhrm. good point01:41
jeblairmordred: (not to mention pulling from them)01:41
lifelessIAD?01:41
lifelessIs that like the younger version of an IED?01:41
mordredIAD is the airport code for the Washington Dulles airport01:42
mordredlifeless: and is a new not-quite-rolled out region in rax cloud01:42
lifelessah01:42
mordredlifeless: I don't know if we've mentioned before, but all of our important servers run in rax, because hp is too ephemeral and also blocks email ports01:43
lifelessthe email thing I knew01:43
lifelessI didn't know about the ephemeral aspect; do you mean flaky?01:43
mordredthey also have not taken our feedback about how this makes them not suitable for our usecase to heart01:43
mordredyes01:43
lifelessis there a trouble ticket open on it?01:43
mordrednodes get deleted from time to time01:43
lifelessThat seems like something we should do.01:44
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278401:44
mgagnetoo fast, can't comment =(01:44
clarkbI believe that patchset will pass tests and it has had some additional cleanup done to it01:44
clarkbmgagne: you can commetn on the older patchest01:44
clarkbmgagne: I will look for your comments there01:44
jeblairmordred: i believe my new scripts put all the git repos in ~ubuntu01:44
mordredjeblair: oh poop. that's now where devstack looks for them01:45
mordrednot01:45
jeblairmordred: it's not the usual place, no.01:45
mgagnesup with 29418 and 19418 ?01:46
clarkbmgagne: 29418 is where the actual git-daemon will listen. Then each is fronted by an haproxy to do queueing that haproxy listens on 19418.01:47
clarkbmgagne: all so that 9418 is free on git.o.o for the world. Its a bit ugly yes01:47
clarkbbut I figure the haproxy at the front of everything shouldn't worry about queueing01:48
clarkbI could be completely wrong01:48
mgagneclarkb: I understand now, I was wondering if it was legitimate or you were typing with boxing gloves on01:49
clarkbmgagne: gotcha, thank you for checking01:49
mgagneclarkb: 4443?01:51
clarkbmgagne: again to not conflict with 443 on the frontend haproxy, because the frontend haproxy is sharing space with apache01:52
mordredjeblair: are you respinning/fixing? or would you like me to so you can get dinner? I'm happy to squeeze in being mildly helpful before going away01:52
clarkboh I missed the gerrit replication stuff /me adds that01:52
jeblairmordred: i have a local patch i'm about to let it test while eating, so i think i got it.01:52
mordredjeblair: ok01:53
jeblairmordred: probably applying your expertise to reviewing the haproxy thing would be helpful01:53
mgagneclarkb: my mistake, sorry01:53
*** ftcjeff has joined #openstack-infra01:53
lifelessjeblair: url? [I have haproxy knowledge, for my sins]01:54
lifelessclarkb: one of the most useful things haproxy can do for maintaining consistent response time is to cap the concurrent backend work that is being permitted01:54
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278401:54
lifelessclarkb: so it totally should managing queue01:54
clarkblifeless: yeah that is what we are using it for01:54
lifelessbe managing the queue01:54
lifelessclarkb: then 13:48 < clarkb> but I figure the haproxy at the front of everything shouldn't worry about queueing01:55
lifelessclarkb: has me confused01:55
clarkblifeless: just not the frontend haproxy. We have three layers for git-daemon. The middle haproxy worries about the queue01:55
clarkblifeless: haproxy 9418 -> haproxy 19418 -> git-darmon 2941801:55
lifelessclarkb: cross-meshed across two machines ?01:55
clarkblifeless: https://review.openstack.org/#/c/42784/01:55
clarkblifeless: no I am not worrying about the ha part of ha proxy right now01:56
clarkblifeless: https://review.openstack.org/#/c/42784/6/modules/cgit/manifests/init.pp has the most interesting bits in it01:56
lifelessclarkb: ok, so whats the front end haproxy for ?01:56
clarkblifeless: load balancing01:56
lifelessclarkb: what are the middle ones for then ?01:57
clarkblifeless: queueing01:57
openstackgerritKui Shi proposed a change to openstack-dev/hacking: Improve H202 to cover more cases  https://review.openstack.org/4302901:57
lifelessclarkb: that doesn't make sense to me01:57
clarkblifeless: git-daemon needs queueing otherwise it just goes nuts. A simple haproxy -> gitdaemon gives us that01:58
*** ryanpetrello has joined #openstack-infra01:58
lifelessclarkb: ok; so why isn't that the front haproxy ?01:58
clarkblifeless: reason #1 is to make it easier to consume lbaas01:58
clarkbthe thing that the simple haproxy in front of gitdaemon is doing is something that our lbaas providers cannot do01:58
lifelessthe lbaas apis don't expose the full capabilities of haproxy like queue depth limits etc?01:59
clarkbbut everything in the frontend haproxy issomething that could be replaced with lbaas01:59
clarkblifeless: they do not01:59
lifelesssadface01:59
clarkbyou can set per host throttles01:59
clarkbthat is it01:59
*** lbragstad has quit IRC02:00
lifelessclarkb: so frankly, I wouldn't use lbaas then; you want queuing handled at the front end, and ha in the middle tier02:00
*** wenlock has joined #openstack-infra02:00
lifelessclarkb: but - this is your teams call; I'm just coming from my running-busy-site-with-haproxy-squid-etc-etc background02:00
mordredlifeless: the main thing we're trying to get with this is just _some_ headroom without reengineering the whole thing yet02:01
mordredwe'd like to do a better/deeper re-architecture02:01
clarkblifeless: thanks, it is good to know. And yes we do plan on actually testing and engineering this stuff02:02
clarkbbut right now we need a thing that works02:02
mordredbut we need to actually analyze what's going on and what our capacity is etc  - get real numbers/baselines02:02
lifelessclarkb: whats deployed right now?02:02
mordredyah. what he said02:02
lifelessclarkb: all three layers?02:02
mordredlifeless: a single apache server serving git02:02
lifelessok02:02
clarkblifeless: and xinetd in front of git-daemon02:02
clarkbwhich is bonghits02:03
lifelessso two layers of haproxy will work, but if you want to keep it simpler - which I encourage - I'd just deploy a single haproxy frontend02:03
SpamapSsimple is for the weak02:03
clarkbI am going to manually isntall puppet-haproxy on the puppet master so that we can use dev envs with this change02:03
lifelessand ignore lbaas for now, because what you want right now is breathing room.02:03
mordredclarkb: ++02:03
mordredlifeless: yes. we are ignoring lbaas for now for sure02:04
SpamapShttp://terrorobe.soup.io/post/132401460/Downtime-is-sexy-Josh-Berkus-of-PostgreSQL02:04
SpamapS:)02:04
clarkbmordred: of course if I can't ssh to that server I might not install puppetlabs-haproxy02:04
clarkbmordred: are you able to get in?02:04
mordredci-puppetmaster?02:05
clarkboh now it access my connection02:05
clarkbmordred: yeah02:05
*** zul has joined #openstack-infra02:05
mordredclarkb: # TODO add additional git servers here.02:05
clarkbmordred: you like that?02:06
*** markmcclain has quit IRC02:06
mordredclarkb: so, if I'm reading this right...02:06
clarkbmordred: I think this may be a case of getting everything going on git.o.o first. Then building the new hosts and kicking everythin02:06
lifelessclarkb: looking at this I really think one haproxy is better02:06
mordredclarkb: yes. so deploy the haproxy on git.o.o that haproxies localhost02:06
mordredclarkb: right?02:07
lifelessclarkb: your configuration could give you terrible latency as it stands02:07
clarkbmordred: yup02:07
lifelessclarkb: in overload situations02:07
mordredand then add the additional git servers to it02:07
mordredlifeless, clarkb: lifeless suggestion should be easy enough to test- set balance_git to false on git1 and git202:07
clarkblifeless: it could. the git-daemon stuff won't actually be heavily used immediately so we can work it out02:08
clarkblifeless: the http(s) stuff is the immediate concern02:08
clarkbci-puppetmaster seems to have network trouble too02:08
mordredyeah.02:08
clarkbI can't git fetch my change02:08
mordredclarkb: check load02:08
clarkbmordred: its 1.5 ish02:09
mordredclarkb: is salt running again?02:09
mordredit should be 002:09
clarkbhmm it is salt master again. I am going to kill that thing with fire02:09
mordredyup. salt-master02:09
mordredI believe puppet is going to re-launch him for us02:09
clarkbheh all better now02:09
clarkbmordred: ugh02:09
mordredclarkb: might be worth disabling puppet agent on puppetmaster for a sec02:09
clarkbmordred: ok I will do that. mordred do you want to write a puppet change to disable it?02:10
mordredyes02:10
mordredon it02:10
clarkb*to disable salt master02:10
lifelessclarkb: nearly finished02:12
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Disable salt master and minions globally  https://review.openstack.org/4303002:12
mordredclarkb: I hit it with a stick02:13
mordredclarkb: our salt class wasn't really written with disabling in mind02:13
mordredand I didn't want to run the risk of deleting the key info02:13
lifelessclarkb: ok, reviewed.02:13
clarkbI have stopped puppet on git.o.o as well. I am going to run puppet agent --test --environment development --noop there02:13
clarkblifeless: thank you looking02:14
lifelessclarkb:tl;dr - one haproxy, set a backlog of 200 or so, make sure you have maxconn and maxqueue set for each backend02:14
lifelessclarkb: the backlog affects when clients get an error rather than a long pause during overload; the maxconn prevents overloading a backend, and the maxqueue is about signaling overload and errors early02:15
clarkblifeless: so maxqueue is different than the conn backlog?02:16
clarkblifeless: also, for whatever it is worth we seem to be very bursty eg after a gate reset02:16
mordredyeah, I think we're quite ok with things sitting in backlog wait for a while02:16
clarkblifeless: so having a longer backlog where things wait their turn is better than failing a bunch of tests02:16
clarkblifeless: or at least that was the theory02:17
lifelessclarkb: yes, they are different.02:17
lifelessclarkb: so I suggest get it up and working and then tune the numbers up02:17
clarkbthat is the plan02:17
lifelessclarkb: backlog holds things in SYN without SYN-ACK02:17
clarkbdoes maxqueue hold things after a handshake?02:18
lifelessyes02:18
lifelessthere is a TCP timeout on backlog02:19
lifelessso you really don't want it too long02:19
lifelesslet me dig that up02:19
lifelesshttp://www.ietf.org/mail-archive/web/tcpm/current/msg07472.html02:19
lifeless0,3 etc seconds02:19
lifelessand folk are talking about reducing it02:19
clarkblifeless: I don't see a maxqueue in the keyword list at http://code.google.com/p/haproxy-docs/wiki/balance02:20
lifelessyou really can't sanely avoid errors by making the backlog high02:20
lifelessclarkb: huh, ignore the wiki, useless.02:20
lifelessclarkb: http://haproxy.1wt.eu/download/1.3/doc/configuration.txt02:20
lifelessmaxqueue <maxqueue>02:20
lifeless  The "maxqueue" parameter specifies the maximal number of connections which02:20
lifeless  will wait in the queue for this server. If this limit is reached, next02:20
lifeless...02:20
clarkblifeless: https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/params.pp#L10-L34 are the default that we would use if we don't explicitly set them02:20
*** cwj has left #openstack-infra02:20
clarkbnot sure why there are two different maxconn values02:21
lifelessclarkb: one is on the server, one is on frontend02:21
lifelessclarkb: they are wholly different and its terrible it has the same name02:21
clarkb8k is server wide and 4k is frontend specific?02:22
lifelessclarkb: not sure about the puppet mapping; sorry - that maxconn mentionI made was about the global thing vs server backend limits02:23
lifelessthose defaults look non-terrible to me.02:23
lifelessclarkb: anyhow, backlog has to be less than 3s - (max RTT/2) to avoid retransmits of SYN02:24
lifelessclarkb: which would just add overhead.02:24
lifelessclarkb: so yeah, way lower than you have it.02:24
lifelessclarkb: use the queue timeout value and maxqueue to control how long something can be queued, and how many things can be queued for a server.02:25
lifelessclarkb: HTH, I need to run for a bit; ping here and I will happily review again - or if you can get me a rendered haproxy config I'm very happy climbing through those02:25
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278402:26
openstackgerritClark Boylan proposed a change to openstack-infra/config: Swap git daemon in xinetd for service  https://review.openstack.org/4301202:26
clarkbmordred: lifeless ^ that fixes a bug the noop puppet run found in pleia2's change02:26
clarkblifeless: thank you. I think I am going to try ramming this in with the http stuff then fixup the gitdaemon stuff in a subsequent change02:27
clarkbthough maybe that is more work than it is worth02:27
clarkbI think I am going to take this as an opportunity to head home02:28
clarkbrerunning noop puppet really quickly with the latest patchset02:29
*** jerryz has quit IRC02:33
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278402:34
clarkbpuppet noop is being slow so I went ahead and fiddled with lifeless' suggestions02:34
*** yaguang has joined #openstack-infra02:34
jeblairhpcloud seems to be working better now, and nodepool seems to be doing a better job deleting nodes now02:35
*** morganfainberg is now known as morganfainberg|a02:36
clarkbjeblair: mordred: the current noop run looks mostly clean. There is one error but I think it is related to puppet not copying files locally because it is in noop mode02:36
clarkbjeblair: mordred do we want to attempt applying it?02:37
clarkbI think rolling back will involve stopping haproxy, and reapplying old puppet to get the old apache configs back02:37
jeblairclarkb: i'm about to check out for the night (i'm past my point of uselessness), so i'd say: your call02:37
*** adalbas has quit IRC02:37
clarkbjeblair: I am feeling a bit like that too02:37
jeblairclarkb: i'm mostly sticking around to fix the image thing (which should reduce the criticality of the git thing)02:38
clarkbprobably best to hold off on git for now02:38
jeblairclarkb: sounds like the way to go.02:38
jeblairclarkb: see how i'm talking?  "image thing"  "git thing"?02:38
jeblairuseless02:38
*** xchu has joined #openstack-infra02:39
clarkb:) I am beat02:39
* clarkb heads home. Tomorrow we can hit this thing with a giant stick02:39
lifelessooh, stick.02:39
clarkblifeless: are my numbers at https://review.openstack.org/#/c/42784/8/modules/cgit/manifests/init.pp any better?02:40
lifelessclarkb: btw - http://code.google.com/p/haproxy-docs/wiki/ServerOptions is where maxqueue is covered02:40
*** rcleere has joined #openstack-infra02:40
lifelessclarkb: it looks like that wiki is just machine-processed from the docs02:40
lifelessclarkb: I don't know if maxqueue there will end up in the right place; but the literal numbers are saner yes.02:41
mordredjeblair: you're sounding like me!02:42
clarkblifeless: ya  Iwas worreid about it not ending up in the right place after reading the maxqueue doc02:42
lifelessclarkb: I'm not confident they are right in any shape, but then I have a different model for how failures should go in my head :)02:42
jeblairmordred: i just put "sudo -u ubuntu" in a script and wondered why it didn't run as jenkins.02:42
lifelessclarkb: and now is not the time to run through that given tired + fire drill02:42
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278402:45
clarkblifeless: ^ that puts maxqueue in the correct spot and now I am getting off of IRC02:45
clarkbmordred: don't have too much fun on the playa. it will only make feature freeze less enjoyable for the rest of us :)02:46
mordredclarkb: well, you can also look at it the other way...02:46
mordredclarkb: I will be hitting in the extreme weather conditions of the high desert in an arid desert with a abnormally basic pH02:46
mordredclarkb: where the only running water, food, electricity or trash service are the ones I bring myself02:47
mordredclarkb: surrounded by 60k people who are all in various stages of mind alteration who are walking around with things on fire02:47
clarkbmordred: good point. You have basically described why I would have a hard time doing it myself :)02:47
mordred:)02:48
jeblairmordred: how is that different than feature freeze?02:48
lifelessjeblair: more dust?02:49
jeblairlifeless: that must be it02:49
*** jhesketh has quit IRC02:52
*** melwitt1 has quit IRC02:52
*** jhesketh has joined #openstack-infra02:57
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Move setup scripts destination  https://review.openstack.org/4303302:57
jeblairmordred: around?  part 1 of my fix ^03:00
mordredjeblair: looking03:00
*** dims has quit IRC03:00
mordredjeblair: and the difference is the alkalinity03:00
jeblairmordred: i don't actually need to merge that one (i can just run it in place)03:00
jeblairmordred: the next one i will need to merge03:01
mordredjeblair: +2 anyway03:01
SpamapSI am so jealous of you guys03:02
SpamapSI haven't scaled anything in years. :-P03:03
mordredSpamapS: you're always welcome in here03:04
jeblairSpamapS: it's all yours if you want it.  :)03:04
mordredSpamapS: there's plenty to go around03:04
mordredSpamapS: also, just wait until we start using heat for some of this03:04
SpamapSuh err, no I'm busy with my theoretical scaling things in Heat.03:04
jeblairSpamapS: the team scales horizontally too03:04
mordredSpamapS: we'll have excellent real-world feedback for you03:04
SpamapSI actually can't wait03:04
*** jog0 is now known as jog0-away03:04
mordredyou say that now...03:04
jeblairSpamapS: it comes in the form of mordred yelling03:05
mordrednothing is ever quite so fun as watching the thundering herd of feature freeze come your way03:05
SpamapSYou guys will thank me that I got this one done: https://bugs.launchpad.net/heat/+bug/1214580 :)03:05
uvirtbotLaunchpad bug 1214580 in heat "Heat does not make use of the C libyaml parser." [High,In progress]03:05
jeblairthat's some serious scaling03:05
mordredSpamapS: is libyaml web-scale?03:05
SpamapSmordred: its not. It doesn't use /dev/null.03:05
mordredSpamapS: I mean, I've heard that /dev/null processes yaml faster03:05
mordreddammit03:06
mordredyou were quicker03:06
SpamapSNeed to tackle the ORM insanity though.. https://bugs.launchpad.net/heat/+bug/121460203:06
uvirtbotLaunchpad bug 1214602 in heat "stack_list loads all resource from the database via the ORM" [Medium,Triaged]03:06
*** woodspa has quit IRC03:06
mordredSpamapS: oh, have fun with that03:06
SpamapS100 stacks, 10 resources each == 1000 sql queries to do 'heat stack-list'03:06
SpamapSactually probably 1100 sql queries03:06
mordredSpamapS: now that IS web-scale03:08
SpamapShttps://bugs.launchpad.net/heat/+bug/121423903:10
uvirtbotLaunchpad bug 1214239 in heat "Infinitely recursing stacks reach python's maximum recursion depth" [Medium,Triaged]03:10
SpamapSmordred: ^^ thats what I'm working on right now03:10
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Fix nodepool setup scripts  https://review.openstack.org/4303703:13
jeblairmordred: review+aprv ^ ?03:13
mordredjeblair: wow. you might actually almost enjoy te weather on playa this year: http://www.weather.com/weather/tenday/Gerlach+NV+USNV003303:13
mordredjeblair: looking03:13
jeblairmordred: wow, not bad.  i could dig that.03:14
jeblairmordred: maybe i'll go to the desert next door.03:14
mordredjeblair: why get rid of pushd/popd? (curious)03:14
jeblairmordred: don't care about the current dir anymore; there's an explicit cd to the script dir at the bottom03:15
mordredjeblair: ah. see it03:15
jeblairmordred: (cwd should now be ~jenkins instead of the script dir)03:15
jeblairmordred: (because of the sudo)03:15
mordred+2 - want me to aprv?03:15
jeblairmordred: pls03:16
mordreddone03:16
jeblairmy local instance is spinning up a host from an image from that now; i'll double check its sane and then apply03:16
mordred++03:17
jeblairmordred: then i'll set the image state to delete for one of the providers, which should automatically build a new one03:18
jeblairand do that one at a time03:18
openstackgerritA change was merged to openstack-infra/config: Fix nodepool setup scripts  https://review.openstack.org/4303703:19
jeblairmordred: even though i set one image to deleted, it's going to rebuild all of them.03:26
jeblairso, um, hopefully it will work.  :)03:26
jeblair(the old ones will still be there, so we can roll back if we need to; it's just going to be a little less incremental than i'd hoped)03:27
jeblairmordred: i think the post jobs need to fetch from review.o.o; the replication to git.o.o isn't fast enough03:31
jeblairmordred: (or else, we should catch that error and retry in g-g-p)03:32
jeblairmordred: https://jenkins02.openstack.org/job/openstack-admin-manual-netconn/47/console03:32
*** blamar has quit IRC03:32
*** mberwanger has joined #openstack-infra03:34
*** blamar has joined #openstack-infra03:34
*** ^d has quit IRC03:44
jeblairmordred: that image seems to be good; it's no longer cloning repos03:46
mordredjeblair: awesome03:47
jeblairunfortunately, the image build process for az2 was disconnected, so it's still using the old one03:47
mordredjeblair: and yes  re: post fetch from review.o.o03:47
jeblairi'll kick off another image build, hopefully az2 will succed this time03:47
mordredsigh03:47
jeblairmordred: the good news is that at this point, it will keep making new nodes from az1 and az303:48
jeblairand will only start using az2 again if that image update succeeds03:48
jeblairmordred: so starting from right now, no new nodes should be created from the old images03:48
*** nayward has joined #openstack-infra03:57
openstackgerritJason Meridth proposed a change to openstack-dev/hacking: Adds ability to ignore hacking validations with noqa  https://review.openstack.org/4171303:58
jeblairmordred: az2 failed again04:01
*** ftcjeff has quit IRC04:01
jeblairi'm going to leave it as is, maybe it'll get better overnight.04:01
mordredkk04:06
*** afazekas has joined #openstack-infra04:09
*** vogxn has joined #openstack-infra04:09
mgagnewow, git.o.o interface is blazing fast now04:13
mordredmgagne: whee!04:17
mordredmgagne: it helps when it's not being pummelled to death by zuul jobs04:17
mgagnemordred: well, we can say it was a great benchmark04:18
mordredmgagne: we always learn MANY MANY things during feature freeze04:18
mgagnemordred: haha, I was questioning myself the timing of such update =)04:18
mordredmgagne: we knew the rust was coming, we've been trying to get enough new tech in place to handle it04:19
mordredmgagne: part of this rush was that we removed one of the bottlenecks from last time by making that part of the system better04:19
mordredmgagne: and have thus found the next piece in the puzzle :)04:19
mgagnemordred: =)04:19
*** senk has quit IRC04:20
*** sridevi has joined #openstack-infra04:22
*** sridevi has left #openstack-infra04:22
*** sridevi has joined #openstack-infra04:23
*** wenlock has quit IRC04:23
*** sridevi has quit IRC04:32
jeblairmordred: we now have a full set of images for all the providers04:33
*** mberwanger has quit IRC04:34
*** morganfainberg|a is now known as morganfainberg04:35
*** mberwanger has joined #openstack-infra04:38
* fungi is caught back up and reviewing the outstanding bits. glad the source of the pummeling was figured out04:38
fungieven with the performance issues we had, the graph says we still spiked up to 600jph today04:40
jeblairmordred, clarkb: there's a boat load of hpcloud servers stuck in "ACTIVE(deleting)" state; we may need to open a trouble ticket if the're still around tomorrow04:40
jeblairthe neutron job seems to be flakey :(04:42
*** nati_ueno has joined #openstack-infra04:42
*** nayward has quit IRC04:46
*** Anju has joined #openstack-infra04:54
*** ladquin is now known as ladquin_afk04:55
*** thomasbiege1 has joined #openstack-infra05:01
*** thomasbiege1 has quit IRC05:01
*** mirrorbox has quit IRC05:05
*** mberwanger has quit IRC05:06
*** ogelbukh has quit IRC05:06
*** enikanorov-w has quit IRC05:08
*** enikanorov-w has joined #openstack-infra05:10
*** sridevi has joined #openstack-infra05:15
*** rcleere has quit IRC05:23
srideviHi, can anyone help me debug the jenkins' failure in https://review.openstack.org/#/c/34801/05:28
sridevianyone?05:29
srideviaround?05:29
*** DennyZhang has joined #openstack-infra05:33
*** nicedice_ has quit IRC05:37
*** UtahDave has joined #openstack-infra05:47
*** DennyZhang has quit IRC05:55
openstackgerritA change was merged to openstack/requirements: Remove upper bounds on lifeless test libraries  https://review.openstack.org/4251505:55
*** vogxn has quit IRC05:57
*** cody-somerville has quit IRC05:57
*** sridevi has quit IRC05:57
openstackgerritA change was merged to openstack/requirements: Add dogpile.cache>=0.5.0 to requirements  https://review.openstack.org/4245505:58
*** vogxn has joined #openstack-infra05:58
*** w_ has joined #openstack-infra06:02
*** olaph has quit IRC06:05
*** ryanpetrello has quit IRC06:11
*** vogxn has quit IRC06:11
*** cody-somerville has joined #openstack-infra06:13
*** nayward has joined #openstack-infra06:17
*** vogxn has joined #openstack-infra06:20
*** Dr0id has joined #openstack-infra06:20
*** dmakogon_ has joined #openstack-infra06:24
*** Dr0id has quit IRC06:25
*** annegentle has quit IRC06:25
*** odyssey4me4 has joined #openstack-infra06:25
*** psedlak has joined #openstack-infra06:30
*** annegentle has joined #openstack-infra06:30
*** AJaeger has joined #openstack-infra06:33
*** sridevi has joined #openstack-infra06:34
*** afazekas has quit IRC06:44
*** jinkoo has joined #openstack-infra06:51
*** ruhe has joined #openstack-infra06:52
*** Guest75819 has quit IRC06:56
openstackgerritMark McLoughlin proposed a change to openstack/requirements: Allow use of oslo.messaging 1.2.0a10  https://review.openstack.org/4306007:04
*** lillie has joined #openstack-infra07:06
*** lillie is now known as Guest1633107:06
*** stevebaker has quit IRC07:07
*** Dr01d has joined #openstack-infra07:10
*** stevebaker has joined #openstack-infra07:12
sridevianyone around?07:14
srideviI'm having trouble debugging the devstack neutron failures07:15
*** stevebaker has quit IRC07:18
*** thomasbiege1 has joined #openstack-infra07:18
*** jinkoo has quit IRC07:19
*** yonglihe_ has joined #openstack-infra07:19
yonglihe_hello, seems Jenkins build machine had problem,07:20
yonglihe_2013-08-21 06:11:42.794 | Started by user anonymous07:20
yonglihe_2013-08-21 06:11:42.797 | [EnvInject] - Loading node environment variables.07:20
yonglihe_2013-08-21 06:11:42.833 | Building remotely on centos6-7 in workspace /home/jenkins/workspace/gate-nova-python2607:20
yonglihe_2013-08-21 06:11:42.866 | [gate-nova-python26] $ /bin/bash -xe /tmp/hudson2665365283182338716.sh07:20
yonglihe_2013-08-21 06:11:42.873 | + /usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org https://git.openstack.org07:20
yonglihe_2013-08-21 06:11:42.877 | Triggered by: https://review.openstack.org/3507407:20
yonglihe_2013-08-21 06:11:42.877 | + [[ ! -e .git ]]07:20
yonglihe_2013-08-21 06:11:42.878 | + git remote set-url origin https://git.openstack.org/openstack/nova07:20
yonglihe_2013-08-21 06:11:42.882 | + git remote update07:20
yonglihe_2013-08-21 06:11:42.889 | Fetching origin07:20
yonglihe_2013-08-21 06:51:42.842 | Build timed out (after 40 minutes). Marking the build as failed.07:21
yonglihe_2013-08-21 06:51:42.934 | fatal: The remote end hung up unexpectedly07:21
yonglihe_2013-08-21 06:51:42.939 | error: Could not fetch origin07:21
yonglihe_2013-08-21 06:51:42.941 | + git remote update07:21
yonglihe_2013-08-21 06:51:42.949 | Fetching origin07:21
yonglihe_http://logs.openstack.org/74/35074/24/check/gate-nova-python26/5227601/console.html07:21
*** pblaho has joined #openstack-infra07:21
yonglihe_sorry for the long log07:21
*** stevebaker has joined #openstack-infra07:21
*** sridevi has quit IRC07:24
morganfainbergyonglihe_: i'm sure not a worry, but next time (to avoid the long log) use a paste (e.g. http://paste.openstack.org/ )07:25
morganfainberg(that way you can reference it again if needed as well w/o having to hunt for it)07:25
*** kspear has quit IRC07:25
*** xBsd has joined #openstack-infra07:27
*** DennyZhang has joined #openstack-infra07:28
*** dmakogon_ has quit IRC07:28
*** shardy_afk is now known as shardy07:29
*** michchap has quit IRC07:29
*** GheRivero has quit IRC07:30
*** kspear has joined #openstack-infra07:30
*** thomasbiege1 has quit IRC07:33
*** GheRivero has joined #openstack-infra07:35
*** dmakogon_ has joined #openstack-infra07:37
*** kspear has quit IRC07:40
yonglihe_thanks morganfainberg, i got it07:40
*** boris-42 has joined #openstack-infra07:42
*** jpich has joined #openstack-infra07:51
*** nati_uen_ has joined #openstack-infra07:54
*** michchap has joined #openstack-infra07:54
yonglihe_http://paste.openstack.org/show/44724/07:54
*** michchap has quit IRC07:54
yonglihe_seems something  lost, but i can not find which machine is this.07:55
*** michchap has joined #openstack-infra07:55
*** fbo_away is now known as fbo07:55
*** nati_ueno has quit IRC07:57
*** vogxn has quit IRC07:57
*** mikal has joined #openstack-infra07:59
*** GheRivero has quit IRC08:02
*** GheRivero has joined #openstack-infra08:02
*** michchap has quit IRC08:03
*** GheRivero has quit IRC08:03
*** GheRivero has joined #openstack-infra08:04
*** xchu has quit IRC08:05
*** nayward has quit IRC08:10
*** nayward has joined #openstack-infra08:11
*** dmakogon_ has quit IRC08:11
*** moted has quit IRC08:11
*** EntropyWorks has quit IRC08:11
*** soren has quit IRC08:11
*** mindjiver has quit IRC08:11
*** clarkb has quit IRC08:11
*** rockstar has quit IRC08:11
*** echohead has quit IRC08:11
*** jeblair has quit IRC08:11
*** echohead has joined #openstack-infra08:12
*** mindjiver has joined #openstack-infra08:12
*** jeblair has joined #openstack-infra08:12
*** clarkb has joined #openstack-infra08:12
*** EntropyWorks has joined #openstack-infra08:12
*** soren has joined #openstack-infra08:12
*** soren has quit IRC08:12
*** soren has joined #openstack-infra08:12
*** moted has joined #openstack-infra08:12
*** Kiall has quit IRC08:13
*** rockstar has joined #openstack-infra08:13
*** rockstar has joined #openstack-infra08:13
*** AJaeger has quit IRC08:13
*** kiall has joined #openstack-infra08:15
*** vogxn has joined #openstack-infra08:20
*** GheRivero has quit IRC08:20
*** xchu has joined #openstack-infra08:21
*** GheRivero has joined #openstack-infra08:21
*** xBsd has quit IRC08:22
*** GheRivero has quit IRC08:22
*** GheRivero has joined #openstack-infra08:22
*** GheRivero has quit IRC08:29
*** GheRivero has joined #openstack-infra08:29
*** xBsd has joined #openstack-infra08:32
*** michchap has joined #openstack-infra08:34
*** michchap has quit IRC08:42
*** UtahDave has quit IRC08:45
*** Dr01d has quit IRC08:45
*** Dr01d has joined #openstack-infra08:46
*** DennyZha` has joined #openstack-infra08:53
*** DennyZhang has quit IRC08:55
*** xBsd has quit IRC08:55
*** jpich has quit IRC08:57
*** jpich has joined #openstack-infra08:59
*** BobBall_Away is now known as BobBall09:06
*** yaguang has quit IRC09:07
*** xchu has quit IRC09:07
*** yaguang has joined #openstack-infra09:09
*** yaguang has quit IRC09:14
*** ruhe has quit IRC09:16
*** ruhe has joined #openstack-infra09:18
*** xchu has joined #openstack-infra09:20
*** yaguang has joined #openstack-infra09:27
*** nayward has quit IRC09:28
*** yaguang has quit IRC09:35
*** ruhe has quit IRC09:37
*** kspear has joined #openstack-infra09:37
*** xBsd has joined #openstack-infra09:39
*** yaguang has joined #openstack-infra09:42
*** nayward has joined #openstack-infra09:52
*** markmc has joined #openstack-infra09:54
*** DennyZha` has quit IRC10:01
*** pcm_ has joined #openstack-infra10:04
*** pcm_ has quit IRC10:06
*** pcm_ has joined #openstack-infra10:06
*** boris-42 has quit IRC10:09
*** ruhe has joined #openstack-infra10:16
*** xchu has quit IRC10:19
*** Shrews has quit IRC10:27
*** Shrews has joined #openstack-infra10:36
*** nati_uen_ has quit IRC10:39
*** markmcclain has joined #openstack-infra10:39
*** xBsd has quit IRC10:39
*** xBsd has joined #openstack-infra10:40
markmcanyone seeing zuul miss events ?10:49
* markmc just pushed ~30 nova patches and there's only 10 in the check queue10:49
markmcmaybe it's just catching up10:50
markmcah, yeah10:50
markmc1 added every 30 seconds or so10:50
*** vogxn has quit IRC10:50
*** SergeyLukjanov has joined #openstack-infra10:50
markmcoh, god, I shouldn't watch the zuul dashboard10:52
markmcthis failure: https://jenkins02.openstack.org/job/gate-swift-devstack-vm-functional/94/console10:52
markmcjust aborted 18 changes in the gate queue10:53
markmctragic10:53
*** SergeyLukjanov has quit IRC10:54
openstackgerritStuart McLaren proposed a change to openstack/requirements: Bump python-swiftclient requirement to >=1.5  https://review.openstack.org/4309210:54
*** ruhe has quit IRC10:59
*** yaguang has quit IRC11:00
*** ruhe has joined #openstack-infra11:01
*** SergeyLukjanov has joined #openstack-infra11:03
*** whayutin_ has joined #openstack-infra11:04
mordredmarkmc: morning11:04
markmcmordred, howdy11:05
mordredmarkmc: at oscon we discussed an idea about how to speculatively deal with the scenario you tweeted about11:05
*** dina_belova has joined #openstack-infra11:05
*** weshay has quit IRC11:05
markmcmordred, the "no! no! no! zuul! don't do it! nooooo!" scenario? :)11:06
mordredmarkmc: it gets complex, so it's not going to happen this cycle, ut there is a way we could use WAY more resources to start a new virtual queue based on the now-presumptive state of the world11:06
mordredmarkmc: yeah11:06
mordredthe reason we leave those jobs aborted currently is that we don't know if changes 1 and 2 will fail or not - so we wait for the queue head of the aborted jobs to resolve11:06
mordredif we restarted them currently, we'd essentially need to start building a tree rather than a plain queue11:07
mordredbut - we had a good chat about it11:07
mordred:)11:07
markmcnot sure I follow, but definitely an interesting subject :)11:07
markmcnow go have fun offline :)11:08
mordredI think now that we have gearman, multi-jenkins and the new nodepool code - we'll be set nicely to think about things like that next cycle11:08
mordredmarkmc: I have 5 hours of plane flights before I get to do that11:08
markmcah, lovely11:08
mordredyah.11:08
markmcuse that time wisely11:08
* mordred is hoping that he can provide _some_ usefulness after how hectic the past two days have been11:09
markmclike replying to all your linkedin recruiter spam11:09
mordredjeez11:09
mordredthat's not possible11:09
mordredalthough, I've learned that there is a Java Opportunity in Studio City, CA11:09
markmcI contemplated hacking on gerrit's topic review support briefly yesterday11:09
markmcvery briefly11:09
mordredhahahaha11:10
*** whayutin_ is now known as weshay11:11
mordredgit.o.o is running warm, but not dying currently:11:14
mordredhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=854&rra_id=all11:14
mordredgiven the current length of the queue, i'm going to take that as a good thing11:15
mordredload average of 11 - all cpus at around 80%11:16
mordredWOW11:18
*** boris-42 has joined #openstack-infra11:19
mordredswift change 28892 has been in the gate queue for 12H11:19
mordredbut it MIGHT merge in 11 minutes11:20
mordredafter which point we will have a gate reset event :)11:20
mordredand everyone can watch the thundering herd clone from git.o.o11:21
*** xBsd has quit IRC11:26
*** xBsd has joined #openstack-infra11:31
*** BobBall has quit IRC11:37
*** lcestari has joined #openstack-infra11:41
*** zehicle_at_dell has joined #openstack-infra11:41
*** nayward has quit IRC11:45
*** dina_belova has quit IRC11:45
*** xBsd has quit IRC11:45
*** AJaeger has joined #openstack-infra11:47
*** AJaeger has joined #openstack-infra11:47
*** xBsd has joined #openstack-infra11:49
*** dims has joined #openstack-infra11:52
*** apcruz has joined #openstack-infra11:54
*** zehicle_at_dell has quit IRC11:55
*** AJaeger has quit IRC11:59
*** ruhe has quit IRC12:00
*** AJaeger has joined #openstack-infra12:02
*** AJaeger has joined #openstack-infra12:02
*** BobBall has joined #openstack-infra12:03
*** dprince has joined #openstack-infra12:03
*** psedlak has quit IRC12:04
*** michchap has joined #openstack-infra12:06
*** michchap has joined #openstack-infra12:07
*** AJaeger has quit IRC12:11
*** SergeyLukjanov has quit IRC12:11
*** zehicle_at_dell has joined #openstack-infra12:12
openstackgerritJulien Danjou proposed a change to openstack-infra/config: Add py33 jobs for WSME  https://review.openstack.org/4311212:16
*** jungleboyj has quit IRC12:17
*** jungleboyj has joined #openstack-infra12:18
*** zehicle_at_dell has quit IRC12:24
*** dkliban has quit IRC12:27
*** jjmb has quit IRC12:35
*** dkranz has joined #openstack-infra12:36
*** dims has quit IRC12:40
*** dkranz has quit IRC12:41
*** dims has joined #openstack-infra12:42
openstackgerritA change was merged to openstack/requirements: assign a min version to pycadf  https://review.openstack.org/4292312:47
*** pabelanger has quit IRC12:55
*** cppcabrera has joined #openstack-infra12:57
*** adalbas has joined #openstack-infra12:57
*** alexpilotti has joined #openstack-infra13:01
*** ruhe has joined #openstack-infra13:04
*** weshay has quit IRC13:05
*** zehicle_at_dell has joined #openstack-infra13:07
*** mriedem has joined #openstack-infra13:09
markmcseeing a lot of these13:11
markmchttps://jenkins02.openstack.org/job/gate-grenade-devstack-vm/2834/console13:11
markmcanyone know what the cause is?13:11
*** cppcabrera has left #openstack-infra13:11
mordredmarkmc: looking13:11
*** dina_belova has joined #openstack-infra13:11
*** dina_belova has quit IRC13:12
*** dina_belova has joined #openstack-infra13:12
mordredmarkmc: http://logs.openstack.org/56/42756/5/check/gate-grenade-devstack-vm/c05ec42/logs/devstack-gate-setup-workspace-old.txt13:13
mordredmarkmc:13:13
mordred+ timeout -k 1m 5m git remote update13:13
mordredFetching origin13:13
mordrederror: RPC failed; result=52, HTTP code = 013:14
openstackgerritwill soula proposed a change to openstack-infra/jenkins-job-builder: Adding AnsiColor Support  https://review.openstack.org/4312113:14
mordredfatal: The remote end hung up unexpectedly13:14
mordrederror: Could not fetch origin13:14
mordredclarkb, jeblair, fungi ^^ looks like we're still slamming git.o.o13:14
markmcok13:15
*** dina_belova has quit IRC13:17
*** acabrera has joined #openstack-infra13:17
*** acabrera has left #openstack-infra13:18
*** vogxn has joined #openstack-infra13:19
*** weshay has joined #openstack-infra13:19
*** anteaya has joined #openstack-infra13:25
*** jjmb has joined #openstack-infra13:25
mordredmarkmc: wanna hear something funny?13:29
markmcmordred, perhaps :)13:29
mordredhacking can't pass unittest in python 3.3 because if its python 3 compatibility checks13:29
mordredbecause the good/bad strings throw different errors :)13:30
markmcnice13:30
mordredyah13:30
*** afazekas has joined #openstack-infra13:30
*** afazekas has quit IRC13:31
*** lbragstad has joined #openstack-infra13:32
*** sandywalsh has quit IRC13:43
*** jjmb has quit IRC13:46
*** changbl has quit IRC13:46
*** ftcjeff has joined #openstack-infra13:48
*** prad_ has joined #openstack-infra13:53
jeblairthings would be a lot better if the neutron job weren't flakey13:54
anteayamorning jeblair13:57
anteayaI might be wrong, but am I seeing we have 10 devstack precise nodes available?13:57
anteayawhen we normally have about twice as many13:57
jeblairanteaya: http://tinyurl.com/kmotmns13:58
anteayaah so the chart at the very bottom of the long check queue on zuul status page is just saying we have very few free, since it is entitled "available test nodes"13:59
jeblairyep13:59
anteayaI knew my interpretation didn't make sense13:59
anteayathanks13:59
* anteaya is refraining from acknowledging mordred since his is on vacation14:00
mordredmorning anteaya14:00
anteayaWednesday 8am, the timeline has got to be Central time, for some reason14:00
mordredjeblair: remember the tox 1.6 issue where it stopped using our mirror?14:01
anteayawhich is weird since I know you are on Pacific time jeblair14:01
anteayamorning mordred14:01
mordredjeblair: I filed a bug and hpk is said that $HOME thing should not have merged/been in 1.614:01
mordredhe's working on a 1.6.1 that reverts that change14:01
*** burt has joined #openstack-infra14:01
mordredand I've just tested it and it works well14:02
jeblairyay14:02
jeblairmordred: did we discuss using afs to share the git repos across several git servers?14:03
mordredjeblair: we did not - but I think it's an excellent idea14:03
mordredjeblair: because, honestly, it's not file io that's a problem - it's the cpu cost associated with calculating what's needed14:03
jeblairmordred: it is seriously worth considering; local caching + invalidation would be good; we'd just need to make sure the locking model works14:04
mordredjeblair: so, quite honestly, if all of our nodes were afs clients and read from /afs/infra.openstack.org/git/$project14:04
jeblairmordred: (all this as opposed to having gerrit star-replicate to n workers)14:04
mordredyes14:04
mordredjeblair: oh, were you thinking afs to get the repos to the gitX servers?14:05
jeblairmordred: oh, heh, well, afs can be somewhat bandwidth inefficient; so i'm not sure how well having everything use it would work; i was just thinking of a pool of git servers.14:05
jeblairmordred: yeah14:05
mordredgotcha14:05
mordredwell, here's the thing14:05
mordredwe can start with that14:05
mordredand it'll either work or not14:05
mordredand then if that is set up - then we can look at whether access via /afs on slaves is better or worse14:06
mordredpretty easilyu14:06
jeblairyep.  though by start you mean 'start looking into after we implement our current plan', right? :)14:06
*** dkliban has joined #openstack-infra14:06
mordredjeblair: god yes14:06
jeblairso we have some real data now14:06
jeblairi mean, it's only like 2 data points14:07
mordredjeblair: did the az2 image update work?14:07
mordredit looked to me like it did from looking at nova image base information14:07
jeblairbut we know that if 100 clients hit git.o.o, we push 20-25Mbit and peg user cpu time14:07
jeblairmordred: do you get the idea that top was lying to us?14:08
*** _TheDodd_ has joined #openstack-infra14:08
mordredtough to say, honestly14:08
jeblairmordred: there's like no i/o.14:09
mordredjeblair: that doesn't realy surprise me14:09
mordredthere's tons of ram on the boxes14:09
jeblairso it's all cpu (and possibly file locking; not sure how that would show up)14:09
mordredI'm pretty sure it's all in the fs cache layer14:09
mordredfile locking I _think_ would show up in sys wait time14:10
jeblairthat's what i'd expect, unless git is doing something on its own14:10
*** ryanpetrello has joined #openstack-infra14:11
*** vogxn has quit IRC14:12
*** michchap has quit IRC14:12
*** dina_belova has joined #openstack-infra14:13
jeblairmordred: i'm reading about git's lockfile usage (to understand current behavior); i note that it _is_ compatible with afs.14:16
mordredneat14:16
mordredyou know - afs client caching may make total access not ridiculous14:16
mordredsince most of the pack files should wind up cached client side14:16
jeblairmordred: it's the initial population i'm worried about; though, i suppose if the devstack nodes have a fully populated afs cache from image creation... maybe not so bad.14:17
*** dina_belova has quit IRC14:17
*** ruhe has quit IRC14:17
mordredjeblair: yah. that's what I was thining14:17
mordredthinking14:17
jeblairmordred: i've uh, never used an afs client that was cloned from another afs client.14:18
jeblairmordred: those two worlds have not collided for me.  :)14:18
mordredlove it14:18
anteayamordred: this was the jenkins failure on your disable salt globally patch: http://logs.openstack.org/30/43030/1/check/gate-ci-docs/1cdc607/console.html.gz can I do "recheck no bug"?14:20
mordredanteaya: yes.14:20
mordredthe failure is a git clone failure14:20
dhellmanngood morning14:20
*** vogxn has joined #openstack-infra14:21
Alex_Gaynordhellmann: morning (I assume you're not at home?)14:21
jeblairwell, crap; it looks like zuul is stuck again14:21
mordredmorning dhellmann !14:21
anteayamordred: that was what I thought, thanks for confirmation14:21
dhellmannAlex_Gaynor: it's still morning here at home :-)14:21
anteayamorning dhellmann Alex_Gaynor14:21
anteayajeblair: :(14:21
*** datsun180b has joined #openstack-infra14:22
openstackgerritA change was merged to openstack-infra/zuul: SIGUSR2 logs stack traces for active threads.  https://review.openstack.org/4295914:22
mordredAlex_Gaynor: there's a little bit of pushback from clayg on syncing with global requirements - I responded that it's not urgent and that perhaps sdague and I should chat with him when we both get back from vacation14:22
mordredAlex_Gaynor: but then I just realized that you have a foot in both worlds14:22
jeblairi forced that ^ so it's in place after the restart14:22
mordredjeblair: great. I support you in that14:22
Alex_Gaynormordred: Ok, I can take a look at trying to push that along, I need to take a bit and figure out what hte most effective advocacy strategy is going to be14:23
anteayajeblair: do we need to change channel status do you think?14:23
Alex_GaynorI think so, zuul seems totally stalled14:24
mordredAlex_Gaynor: yeah  - I think we might need to articulate better the reasons we want it14:24
dhellmannso I'd like to set up WSME on launchpad so bugs are updated when things happen in gerrit. IIRC, to do that for ceilometer I added a user (or group?) to our Drivers group. Is that right?14:24
mordredAlex_Gaynor: also, I think we have a little bit of the traditional push-back against 'openstack is one project' (I don't mean that to be nasty, just that there are remaining pockets of resistence to that decision, and I think they color openstack-centric tasks at times, which means extra care needs to be taken with justification)14:25
*** dguitarbite has joined #openstack-infra14:25
jeblairi'm restarting zuul14:26
Alex_GaynorThis is going to cause us to lose all current pipelines? Are there any thoughts about putting that state somewhere persistent?14:27
jeblairAlex_Gaynor: i've saved a copy14:27
Alex_Gaynorjeblair: oh, cool14:27
*** ladquin_afk is now known as ladquin14:29
jeblairi'm adding them back with a 30 second delay between each.14:30
mordredjeblair: nice14:31
mordredjeblair: you know - I wonder - when zuul re-queues things after a gate reset - perhaps it should put a delay between each gearman request? mitigate the herd a little bit?14:31
*** jungleboyj has quit IRC14:33
jeblairmordred: yeah, i was suggesting that to clarkb yesterday as something to explore; we need to be careful that we don't get too backed up14:34
mordredyah14:34
jeblairthat's the thing with queuing systems; if you can't keep up with the throughput, you can get into situations where you never recover14:34
jeblairso i'm much more focused on making sure we can keep up14:34
*** yolanda has joined #openstack-infra14:34
jeblairmordred: that '30 second delay' i'm doing?  that's 15 minutes before the gate queue is populated again.14:35
Alex_GaynorSo is zuul CPU bound, or something else?14:35
yolandahi, i'm trying to deploy zuul using an apache frontend that is on another machine, but i'm having a problem with serving git repos, any one has done something similar?14:35
yolandaproblem i have is with aliasmatch, it refers to AliasMatch ^/p/(.*/objects/[0-9a-f]{2}/[0-9a-f]{38})$ /var/lib/zuul/git/$1 that is on zuul machine, and cannot be accessed from apache14:36
jeblairAlex_Gaynor: zuul is not operating near its limits14:36
Alex_Gaynorjeblair: so it's git / gearman / gerrit ?14:36
jeblairAlex_Gaynor: but if it were, it would be cpu bound14:36
jeblairAlex_Gaynor: the current problem is we can't serve git repos fast enough for all the test jobs14:37
Alex_Gaynorjeblair: ok, so we're sure it's that14:37
*** thomasbiege1 has joined #openstack-infra14:37
jeblairAlex_Gaynor: which is why today's project is load-balancing that across multiple servers14:37
Alex_Gaynorjeblair: surely someone has had this problem before right... ? We can't be the first people to to be git-bound14:37
jeblairAlex_Gaynor: zuul's problem is that it has a bug that we haven't been able to identify due to inadequate logging and lack of ability to get a stacktrace14:38
jeblairAlex_Gaynor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=2314:38
jeblairthat's zuul ^14:38
jeblairAlex_Gaynor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=4314:38
jeblairthat's git ^14:38
Alex_Gaynorconsistent 10-15MBps, that's rpetty cool14:39
*** thomasbiege1 has quit IRC14:39
jeblairmordred, clarkb, zaro: when the gearman server restarts, i think the executorworkerthread dies, which means the offline-on-complete feature fails14:42
*** xBsd has quit IRC14:42
jeblairmordred, clarkb, zaro: which is why a lot of jobs are showing up as lost right now -- they are re-running on hosts that should have been offlined14:42
*** michchap has joined #openstack-infra14:43
jeblairso for the moment, if we stop zuul, we need to delete all the slaves14:44
anteayaouch14:45
*** pblaho has quit IRC14:45
*** gordc has joined #openstack-infra14:46
*** changbl has joined #openstack-infra14:46
*** pabelanger has joined #openstack-infra14:46
*** AJaeger has joined #openstack-infra14:47
*** AJaeger has joined #openstack-infra14:47
openstackgerritRyan Petrello proposed a change to openstack-infra/config: Provide a more generic run-tox.sh.  https://review.openstack.org/4314514:48
*** jungleboyj has joined #openstack-infra14:48
mgagnemordred: what was your gerrit search filter you sent a couple of weeks ago?14:49
jeblairmordred: so none of the az2 nodes are launching jenkins slaves.14:50
jeblairi spot-checked one and got this:14:50
jeblair$ java -version14:50
jeblairSegmentation fault (core dumped)14:50
*** michchap has quit IRC14:51
Alex_Gaynorawesome.14:51
ttxmordred: late pong14:51
jeblair3 makes a pattern, right?14:51
anteayattx: since you are here: https://review.openstack.org/#/c/43002/14:53
*** rnirmal has joined #openstack-infra14:53
*** kspear has quit IRC14:54
*** kspear has joined #openstack-infra14:54
*** ruhe has joined #openstack-infra14:57
*** _TheDodd_ has quit IRC14:59
*** _TheDodd_ has joined #openstack-infra15:01
*** w_ is now known as olaph15:02
ttxanteaya: looking15:04
anteayathanks15:04
Alex_Gaynorjeblair: can you link the review for load balanced git?15:07
*** UtahDave has joined #openstack-infra15:07
*** mrodden has quit IRC15:08
jeblairAlex_Gaynor: https://review.openstack.org/#/c/42784/15:08
jeblairAlex_Gaynor: I think we're also going to do this https://review.openstack.org/#/c/43012/15:08
ttxanteaya: reviewed15:08
anteayattx thank you15:08
*** vogxn has quit IRC15:08
*** rnirmal has quit IRC15:10
openstackgerritAnita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository  https://review.openstack.org/4300215:13
*** dina_belova has joined #openstack-infra15:13
jeblairpleia2, clarkb, Alex_Gaynor: I'm going to spin up a few copies of git.o.o of different sizes (8, 15, 30) for testing.15:15
jeblairpleia2, clarkb, Alex_Gaynor: if we are cpu bound, it looks like the 8gb machines (4vcpus) might be the sweet spot (half the cpus with 1/4 the ram of a 30gb vm)15:15
anteayamordred can I get your feedback on the openstack/governance name, please?15:16
anteayaif you don't like it, can I get a better suggestion?15:16
*** dina_belova has quit IRC15:18
*** mrodden has joined #openstack-infra15:19
*** mkerrin has quit IRC15:20
*** mkerrin has joined #openstack-infra15:20
openstackgerritAnita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository  https://review.openstack.org/4300215:20
*** mkerrin has quit IRC15:23
mordredanteaya: openstack/governance sounds great15:24
mordredjeblair: wow. segfault. nice15:24
mordredmgagne: gerrit search filter ... for things I should review?15:25
jeblairmordred: yeah, i'm going to leave that and assume it's an image problem15:25
mgagneyes15:25
mordredjeblair: k15:25
mordredjeblair: gosh, do we need to make the ssh check an "ssh and run java --version" check?15:25
jeblairslowing down the rate of adding new nodes is mildly helpful atm anyway.15:25
mordredmgagne: I do this: https://review.openstack.org/#/q/watchedby:mordred%2540inaugust.com+-label:CodeReview%253C%253D-1+-label:Verified%253C%253D-1+-label:Approved%253E%253D1++-status:workinprogress+-status:draft+-is:starred+-owner:mordred%2540inaugust.com,n,z15:26
jeblairmordred: hrm, i wonder if the template host was broken, or if the image created from the template host was broken.15:26
mordredjeblair: good question - template host still around?15:26
jeblairit would be difficult to find out, since the template host is deleted almost immediately15:26
mordredyeah. I was afraid of that15:26
mordredmgagne: and I scan that list, and star things that I need to review, then I do: https://review.openstack.org/#/q/is:starred+-label:CodeReview%253C%253D-1+-label:Verified%253C%253D-1,n,z15:27
mordredand unstar things when I'm done with them15:27
mordredI'm doing that every morning when I wake up now15:27
mordredit's helping15:27
mordred(although getting the list under control and then reviewing every morning also helped)15:27
mordredjeblair: az2 was the one having issues yesterday though15:28
mordredjeblair: mark the image for delete and see if it can generate a real one today?15:28
mgagnemordred: thanks!15:29
*** mkerrin has joined #openstack-infra15:30
jeblairmordred: yeah, but we're doing ok on the other 2 for now, this will help with the git.o.o load (a little) so i'm in no rush to fix15:30
mroddenanyone seen tox failing with "no such option: --pre" on the pip install step?15:31
mordredanteaya: reviewed15:31
mordredjeblair: ok15:31
mroddenapparently tox 1.6.0 is on virtualenv 1.9.1 which has pip 1.3.1 embedded which doesn't support --pre15:31
mordredmrodden: link?15:31
markmc*sob* my change approved 8 hours ago is now 34th in the gate queue *sob*15:31
mroddennot sure why i am hitting it all of a sudden15:31
markmcwhere's my violin?15:31
mordredmrodden: that souds like the glanceclient issue from the other day that I expect to be fixed15:32
mroddenmordred: its in my local env15:32
mroddenoh15:32
mordredmrodden: update your glanceclient15:32
mordredmrodden: evil happened15:32
mroddenlol15:32
mroddenwill do15:32
mroddenthanks15:32
* dansmith feels sorry for zuul today15:33
mordreddansmith: it likes it15:33
dansmithmordred: oh, a little masochistic, is it?15:34
mordredheck yes15:34
dansmithkinky.15:34
jeblairmarkmc, dansmith: current major issues: we can't serve git repos fast enough for all the tests we're running; the neutron job appears flakey.15:36
dansmithdammit neutron!15:37
markmcjeblair, yeah, was following along15:37
mriedemi know one guy in here that likes to give out punishment, might be a good match for zuul :)15:37
markmcdon't mind the whining from the cheap seats15:37
*** nati_ueno has joined #openstack-infra15:37
jeblairmarkmc, dansmith: minor issues: zuul has a bug that causes it to stop occasionally; one of our test images has a java that segfaults15:37
jeblairand a few more minor than that15:37
markmcheh, "minor issues"15:37
dansmithnice, I saw the big reset this morning15:38
mordredmarkmc: gotta love feature freeze, when the two of those are 'minor'15:38
jeblairmarkmc: yeah when "zuul stops working" is a minor issue, you know we're having fun.  :)15:38
mordredoh - corralary to that issue- debugging a hung python program is apparently not easy15:38
markmcjeblair, not whining honestly, but how did https://review.openstack.org/#/c/43060/ end up at the bottom after the restart ?15:38
markmcjeblair, shoulda been near the top, no?15:39
jeblairmarkmc: erm, it's worse than that.  :(  it was at the top, but due to a recently discovered very minor issue, when i restarted zuul, several of the test nodes were not off-lined as they should have been15:39
* markmc puts it down to karma for approving his own change15:39
jeblairmarkmc: so it got dequeued due to an erroneously failing test15:40
markmcjeblair, ok15:40
jeblairmarkmc: sorry :(15:40
* markmc shrugs15:40
openstackgerritAnita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository  https://review.openstack.org/4300215:40
jeblairmarkmc: we now know that if that happens again we need to clean up the test nodes until we can automate that case15:40
anteayamordred: thanks15:40
markmcjeblair, cool15:40
jeblairmarkmc: that's what most of the "LOST" jobs on the screen are15:41
*** reed has joined #openstack-infra15:41
markmcjeblair, ok, thanks15:41
markmcjeblair, that's a particularly sad name for a status15:41
markmcLOST,LONELY15:41
markmcnow that would be sad15:41
jeblairmarkmc: or just "SAD"15:42
anteaya:(15:42
markmcjeblair, indeed :)15:42
openstackgerritAndreas Jaeger proposed a change to openstack-infra/config: Build Basic Install Guide for openSUSE  https://review.openstack.org/4298815:44
*** dkranz has joined #openstack-infra15:46
*** nayward has joined #openstack-infra15:49
*** SergeyLukjanov has joined #openstack-infra15:49
*** senk has joined #openstack-infra15:51
chmouelso for the LOST thing should I just do a recheck no bugs?15:51
*** nati_ueno has quit IRC15:51
Alex_Gaynorchmouel: yup15:51
chmouelAlex_Gaynor: tks15:52
* chmouel didn't feel like reading the full scrollback :-p15:52
*** rfolco has joined #openstack-infra15:53
*** dina_belova has joined #openstack-infra15:54
*** vogxn has joined #openstack-infra15:56
*** pcm_ has quit IRC15:56
*** boris-42 has quit IRC15:57
*** mkerrin has quit IRC15:59
*** mkerrin has joined #openstack-infra15:59
jeblairpleia2, fungi, clarkb: the git puppet manifest has some problems; an selinux command failed during the firts run, and i think there may be an rpm/pip conflict on the pyyaml package16:00
*** mkerrin has quit IRC16:01
*** mkerrin has joined #openstack-infra16:01
*** mkerrin has quit IRC16:02
clarkb:(16:02
clarkbjeblair It needs a firewall ypdate too16:02
clarkbjeblair was that run on a new host or the existing?16:03
jeblairthat was easy enough to fix (pip uninstall pyyaml)16:04
jeblairclarkb: new hosts -- i'm spinning up test hosts for benchmarking16:04
clarkbcool. let me know if you catch other puppet things I will update that manifest soon16:04
fungijeblair: ah, yes i believe i pointed out the selinux thing to pleia2 before. i think the issue is that enabling selinux requires a reboot, and the command to adjust selinux won't work until it's activated16:05
*** ruhe has quit IRC16:05
fungii believe it was an oversight caused by hpcloud enabling selinux by default and rackspace not16:05
jeblairyay16:05
clarkbfungi so activate; reboot; puppet?16:05
*** sridevi has joined #openstack-infra16:05
*** jungleboyj has left #openstack-infra16:06
fungiclarkb: i think just reboot, but may need to manually activate selinux before doing so (though i think the puppet selinux module has already set it to be active after a reboot)16:06
srideviHi, can someone help me with this jenkins' failure.16:06
*** AJaeger has quit IRC16:06
sridevihttps://review.openstack.org/#/c/34801/16:06
sridevianyone?16:07
anteayathanks markmc16:07
markmcanteaya, thank you16:08
anteaya:D16:08
anteayasridevi: I'll take a look16:09
sridevithanks anteaya16:09
sridevihttp://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-neutron/3076bcb/console.html.gz16:09
jeblairsridevi: that appears to be a real failure; it happens consistently for every test run for days now.16:09
sridevireal failure, you mean some bug in the patch? jeblair16:10
jeblairsridevi: yes16:10
srideviokay.16:10
jeblairsridevi: i'd recommend setting up a devstack environment and testing it locally there16:11
sridevijeblair: Hmm. But I don't see any error other that "ERROR:root:Could not find any typelib for GnomeKeyring"16:11
anteayaProcess leaked file descriptors.16:11
anteayait is in every failure log16:12
jeblairanteaya: that's harmless16:12
anteayajeblair: ah okay16:12
jeblairsridevi: it looks like the patch broke devstack, from the way the devstack log ends.16:12
srideviHmm16:13
jeblairsridevi: last line of this file: http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-full/984c01f/logs/devstacklog.txt.gz16:13
*** nayward has quit IRC16:13
pleia2hm, what brought in pyyaml?16:13
jeblairpleia2: jeepyb16:13
pleia2jeblair: via pip?16:14
pleia2(looking now)16:15
jeblairpleia2: i think the sequencing is off; it installed jeepyb first which would have easy_installed it using python setup.py install, then it tried to install the rpm16:15
jeblairpleia2: i think either we want to make jeepyb require-> the package, or else remove the package and let easy install do its thing16:16
*** SergeyLukjanov has quit IRC16:16
sridevijeblair: what in the last line."services=s-container"16:16
sridevi?16:16
reedhi guys, how are things going today?16:16
pleia2jeblair: I see, thanks16:16
* fungi is going to be out at the space needle and the science museum for a little while, but will be back on later this afternoon16:17
pleia2fungi: enjoy :)16:17
fungithanks pleia216:17
anteayafungi have fun at the space needle16:17
reedfungi, enjoy... and in  your free time comment on https://review.openstack.org/#/c/42998/ :)16:17
anteayareed: about the same as yesterday, zuul got stuck again this morning16:18
jeblairreed: not terribly well, i think we have at least a full day ahead of us16:18
reed:(16:18
reednot terribly well is hard to parse16:18
anteayasridevi: yes, that is the last line that ran in devstack, after that it broke16:18
*** rnirmal has joined #openstack-infra16:18
jeblairreed: heh, that seems appropriate somehow.  anyway, 'poorly'.  :)16:19
reednot terribly is a double negation, right? makes it a positive... well is positive ... double positive is bad? :)16:19
sridevianteaya: hmm16:19
anteayasridevi: the fact that devstack didn't finish is an indication that the patch affected the devstack installation16:19
reedjeblair, trying to assess how long it will take for https://review.openstack.org/#/c/42998/ to be evaluated and go through... two days?16:20
reed(it's my request for a staging server)16:20
anteayasridevi: so your patch affects swift and the swift container service couldn't install properly16:21
srideviokay16:21
*** ruhe has joined #openstack-infra16:21
anteayasridevi: here is the screen log for the swift container: http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-full/984c01f/logs/screen-s-container.txt.gz16:22
jeblairreed: i hope so; but this is a very exceptional time; we have unprecedented test load, several systems that need upgrading to deal with it, and only two core developers full-time (though i believe we are more than full-time at the moment)16:22
koolhead17hi all16:22
anteayahi koolhead1716:23
koolhead17anteaya: how have you been16:23
anteayakoolhead17: good thanks, trying to be helpful without getting in the way16:23
anteayabusy time right now16:23
jeblairreed: as soon as things are not on fire, i will review your and mrmartin's patches16:23
koolhead17reed: hi there16:23
koolhead17anteaya: what patch are we discussing about16:24
clarkbjeblair: just a little more than full time :)16:24
clarkbjeblair: I am finally in a chair where I can focus. Is there anything I should look at first/immediately?16:24
jeblairclarkb: get the git.o.o load balanced stuff ready to go16:24
clarkbok16:25
reedjeblair, thanks16:25
jeblairclarkb: i'm working on some simple benchmarking (but obviously even simple benchmarking is going to take a bit)16:25
anteayakoolhead17: well, I was helping sridevi with his patch https://review.openstack.org/#/c/34801/ I have a patch up: https://review.openstack.org/#/c/43002/4 and two patches are under consideration hoping they will help the current jenkins/zuul/git issues: https://review.openstack.org/#/c/42784/ https://review.openstack.org/#/c/43012/16:26
anteayaso we have a few to choose from, koolhead17 :D16:26
anteayajeblair clarkb I don't think I know enough to be of use and don't want to slow you down, if there is something you think I can do to help, please tell me16:27
jeblairanteaya: thanks; fielding questions like that ^ is _very_ helpful16:28
anteayajeblair: very good, I shall endeavour to do my best16:28
*** SergeyLukjanov has joined #openstack-infra16:28
clarkbjeblair: any interest in updating the git.pp to possibly run on precise sans cgit?16:29
*** cthulhup has joined #openstack-infra16:29
clarkbjeblair: not sure if you are interested in testing that, but I think it would be a small change16:29
jd__I've a LOST job here https://review.openstack.org/#/c/42642/ should I open a bug?16:30
anteayajd__: yes16:30
anteayano no bug16:30
anteayait is a a result of a zuul restart this morning16:30
anteayathe gearman server lost a thread16:30
jd__anteaya: define "morning"? :)16:30
anteayaand as a result there were lost jobs16:30
anteayasorry yes, you are right16:30
jd__ack, I'll recheck no bug then16:30
anteayaabout 3 hours ago16:30
anteayayes, recheck no bug16:31
anteayathanks16:31
jeblairclarkb: no cgit that way16:31
jd__thanks anteaya16:31
anteaya:D16:31
clarkbjeblair: correct, it would just be a repo mirror16:31
*** markmc has quit IRC16:31
jd__btw I wonder, what/where is openstackstatus used?16:31
jeblairclarkb: haven't we started using the cgit server?16:31
anteayajd__: where do you see openstackstatus?16:31
anteayaI'm on help desk as the fires are being fought16:32
jeblair#status alert LOST jobs are due to a known bug; use "recheck no bug"16:32
openstackstatusNOTICE: LOST jobs are due to a known bug; use "recheck no bug"16:32
*** ChanServ changes topic to "LOST jobs are due to a known bug; use "recheck no bug""16:32
*** dina_belova has quit IRC16:32
clarkbjeblair: a little yes. we would probably end up needing to do an additional set of proxying for cgit back to the centos servers. now that I think about it nevermind16:32
jeblairclarkb: yeah, i think that why we decided to just throw hardware at it for now16:33
jd__anteaya: I meant the bot, but now I see it changes the topic :)16:33
anteayajd__: ah okay16:33
jeblairjd__: it needs some work; it's not very reliable yet16:33
jeblairjd__: eventually we'd like it in all the channels and to have it update web pages16:33
jeblairjd__: it's been a while since we've had time to hack on that16:34
clarkbjeblair: I am not finding python-yaml or pyyaml in our puppet manifest for cgit. It looks like jeepyb installs it and something on centos is installing it globally? And since centos doesn't do site-packages they interfere?16:34
jd__jeblair: what's its Git repository?16:34
*** jpich has quit IRC16:34
jeblairjd__: openstack-infra/statusbot16:34
clarkbjeblair: I think I am going to ignore that for now as you have a work around16:34
jeblairjd__: one of the pre-reqs for all channels is this bug (let me fetch it)16:34
*** gyee has joined #openstack-infra16:34
*** sridevi has quit IRC16:35
jeblairjd__: https://bugs.launchpad.net/openstack-ci/+bug/119029616:35
uvirtbotLaunchpad bug 1190296 in openstack-ci "IRC bot to manage official channel settings" [Medium,Triaged]16:35
jeblair(i don't want to add it to 30 channels manually)16:35
jeblairjd__: and then it has problems reconnecting on netsplits16:35
jeblairdon't know if there's a bug for that16:36
jeblairclarkb: sounds good16:36
jd__jeblair: ack16:36
*** dina_belova has joined #openstack-infra16:37
mroddenwow that is dirty...16:38
anteayamrodden: what are you referencing?16:38
mroddenwhen you pip install virtualenv it drops the latest version it can find of pip into $SITE_PACKAGES/virtual_env/16:38
mroddenand it never updates it from then on16:39
mroddenand that is what it uses when it creates a new virtualenv16:39
mroddenso my virtualenvs were all stuck at pip 1.2.116:39
mroddensorry  $SITE_PACKAGES/virtualenv_support/16:39
mgagnemordred: since you are the gerrit search master to me, how can you exclude changes which have been reviewed by yourself?16:40
clarkbmrodden: correct because virtualenv vendors pip and setuptools and distribute16:40
mroddenclarkb: yeah but for soem reason it had pip 1.2.1 and also pip 1.4.1 and was only using 1.2.116:41
mroddenit doesnt enforce that it copies the correct version from that spot16:41
mrodden:(16:41
anteayamgagne: I'm not sure of his status, he was last here an hour ago16:41
*** Dr01d has quit IRC16:41
mgagneanteaya: thanks, I can wait =)16:42
anteayak16:42
anteayaI'm sure once he is on planes he will pop in again16:42
anteayaI am afk for about 30 minutes, I have to give a hand to a family member16:43
*** dkranz has quit IRC16:45
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278416:45
clarkbjeblair: ^ I believe that is in a reviewable state. I am going to --noop apply it to git.o.o now16:45
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278416:47
clarkband that addressed one more review comment16:47
*** AJaeger has joined #openstack-infra16:47
*** AJaeger has joined #openstack-infra16:47
jeblairclarkb: the switch could be hairy; if it doesn't work, we end up with a lot of failed jenkins jobs16:49
clarkbjeblair: yup16:49
clarkbjeblair: how do you feel about putting jenkins* into shutdown mode while we do it?16:49
jeblairclarkb: may want shut down puppet and apply it to a test node first16:49
*** cthulhup has quit IRC16:49
jeblairclarkb: at the current rate, you'd still have to wait like 30 minutes for the git processes to finish16:50
*** vogxn has quit IRC16:50
clarkbjeblair: just the git processes?16:50
clarkbwow16:50
jeblairclarkb: last i looked, the devstack-gate prep steps were taking a looong time16:50
jeblairclarkb: i have 3 test nodes we can run it on.  :)16:51
jeblair 8 192.237.168.22616:51
jeblair15 162.209.12.12716:51
jeblair30 198.101.151.516:51
jeblairclarkb:  ^16:52
jeblairclarkb: (first column is memory)16:52
clarkbjeblair: ok I can hijack one of them and change its certname so that it gets the haproxy stuff16:52
jeblairclarkb: please; i ran 'puppet apply --test --certname git.openstack.org'16:52
clarkbjeblair: also, this is a multistep process. The change above will only add haproxy and move the apache vhosts and git daemon to offset ports16:53
jeblairclarkb: take the 15g one16:53
clarkbjeblair: it won't do load balancing until we get another change or two in to replciate to the other hosts and balance across them with haproxy16:53
clarkbjeblair: ok16:53
jeblairclarkb: yeah, i like the process; it's just the port move that i'm worried about16:53
clarkbjeblair: should I be running a bunch of clones against the 15g node while I apply puppet?16:54
jeblairclarkb: er, were their firewall changes?16:54
jeblairthere even16:54
clarkbjeblair: ya my latest patchset adds firewall changes16:54
clarkbto allow 4443 and 8080 and 2941816:55
jeblairah, i see it now.16:55
clarkbI am not restricting access to those ports as they are all read only anyways16:55
jeblairclarkb: hosently, i wouldn't worry about it.  if there's a blip; we can deal.  it's more of if it's actually offline more more than 30 seconds we would be very unhappy16:56
clarkbstarting with a --noop on the 15g node16:56
jeblairclarkb: we can also do the jenkins shutdown idea, to reduce the impact16:56
clarkbjeblair: the port change for apache didn't go in so haproxy wouldn't start. Looking into that now16:57
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278416:59
clarkbthat hsould do it, testing16:59
*** nati_ueno has joined #openstack-infra16:59
*** ruhe has quit IRC17:04
clarkbapache isn't letting go of 443 and 80. Looks to be set to listen on those ports in the default configs17:04
pleia2clarkb: yeah, /etc/httpd/conf/httpd.conf has Listen 80 (looking around for https)17:06
pleia2might have a /etc/httpd/conf.d/ssl.conf too17:06
clarkbpleia2: yup17:06
*** portante has joined #openstack-infra17:06
clarkbpleia2: we are not managing those with puppet are we?17:06
pleia2clarkb: nope17:06
portanteclarkb: ran into a swift tox issue, http://paste.openstack.org/show/44776/17:07
*** nicedice_ has joined #openstack-infra17:07
*** ^d has joined #openstack-infra17:07
*** ^d has joined #openstack-infra17:07
portantedo you know what I should do to fix this?17:07
clarkbjeblair: appropriate to just copy what we have there now into a puppet template and toggle the ports?17:07
clarkbjeblair: any better ideas?17:07
portanteclarkb: that is a swift tox issue related to missing "pbr" package17:07
clarkbportante: it looks like you have an old version of pbr installed. can you try tox -re pep8?17:08
*** david-lyle has quit IRC17:08
*** ftcjeff_ has quit IRC17:08
*** ftcjeff has quit IRC17:08
portantek17:08
jeblairclarkb: apache module doesn't deal with it?17:08
clarkbjeblair: oh maybe /me looks17:08
*** david-lyle has joined #openstack-infra17:08
*** ftcjeff has joined #openstack-infra17:08
*** ftcjeff_ has joined #openstack-infra17:09
*** SergeyLukjanov has quit IRC17:09
*** UtahDave has quit IRC17:09
*** dina_belova has quit IRC17:10
portanteclarkb: weird, old version in /usr/lib by why should that affect tox?17:11
clarkbportante: if you have site packages enabled in tox it will use your site packages17:12
clarkbportante: site packages should probably be disabled if it is enabled (I believe the only project that needs it is nova for libvirt)17:12
*** dina_belova has joined #openstack-infra17:12
clarkbjeblair: ssl.conf is already vendored by us (and not by puppetlabs-apache). I will just do the same with httpd.conf and set the ports dynamicaly17:13
*** fbo is now known as fbo_away17:13
jeblairclarkb: sounds good17:13
BobBallwow the gate is queued up a lot! I hadn't been watching!17:14
pleia2clarkb: right, sorry, I did use the ssl one for our certificates (I should not rely on memory!)17:14
*** dina_belova has quit IRC17:15
burtspeaking of the gate: will 38697,2 automatically get restarted, or should I do a reverify no bug ?17:15
burt(looks like the python27 job was killed in the middle, https://jenkins01.openstack.org/job/gate-nova-python27/1231/console)17:16
*** lifeless has quit IRC17:16
portanteclarkb: I believe tox's default is to NOT use global packages, and I can't find anything in our tox.ini file that sets it to true17:16
clarkbportante: correct the default should be to not use it. The way to toggle it is with sitepackages = true iirc17:17
*** lifeless has joined #openstack-infra17:17
clarkbportante: however, I think your virtualenvs may be stale as well17:17
portanteI removed my entire .tox tree17:17
clarkbportante: if you do a .tox/pep8/bin/pip freeze do you see pbr17:17
clarkbportante: oh. Do you still see the error?17:17
*** zaro has quit IRC17:17
portantenot now, because I removed the /usr/lib/python2.7/site-packages/pbr* directory in order to make progress17:18
portanteclarkb: and yes, now I do see the correct pbr version in the freeze output17:19
*** ryanpetrello has quit IRC17:19
*** ryanpetrello has joined #openstack-infra17:20
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278417:20
anteayaback17:20
clarkbportante: this is for swift? I am going to take a quick peak at the tox.ini17:21
*** mordred has quit IRC17:22
portanteclarkb: yes, thanks17:22
*** dmakogon_ has joined #openstack-infra17:22
anteayaBobBall: yes, large queue much work happening to address it17:23
*** rcleere has joined #openstack-infra17:23
clarkbpleia2: any idea of how to make selinux allow apache to listen on ports 8080 and 4443?17:23
jeblairclarkb: semanage port -a -t http_port_t -p tcp 808017:24
clarkbjeblair: we should puppet that :)17:24
anteayaburt: right now my best advice is to reverify17:24
anteayaif I am wrong it is on me17:24
* clarkb looks at puppet selinux docs17:25
pleia2clarkb: I'll poke around hte puppet module17:25
jeblairclarkb: shouldn't be hard (if it isn't already) semanage lets you query and add17:25
*** yolanda has quit IRC17:25
*** jpeeler has quit IRC17:25
burtanteaya: thanks, will do17:25
*** ruhe has joined #openstack-infra17:26
anteayaburt welcome17:26
clarkbjeblair: I suppose I can add a couple execs if nothing else17:26
pleia2clarkb: actually, puppet module won't do this, we'll probably need to do something like I did with restorecons17:27
jeblairwhat _does_ the module do? :)17:27
pleia2turns it on and off, loads more modules17:27
pleia2it's pretty simple17:27
jeblaireverything except managing selinux :)17:27
pleia2yeah, there is at least one manging one out there but it wasn't very good17:28
*** BobBall is now known as BobBallAway17:28
jeblair:(17:28
*** mordred has joined #openstack-infra17:30
jeblairgah, one of my test worker nodes is a dud; takes 1:40 to clone nova alone (standard is 0:22)17:31
jeblair(i find i'm benchmarking the clients before i can benchmark the server)17:31
pleia2clarkb: we'll also need to add the policycoreutils-python package (that's what has semanage)17:32
clarkbpleia2: ya just discovered that17:32
clarkbjeblair: :(17:32
jeblairthe other 9 are ok though.  :)17:33
anteayamordred: I think this was the only comment I saw directed at you since you were last here: <mgagne> mordred: since you are the gerrit search master to me, how can you exclude changes which have been reviewed by yourself?17:33
*** jpeeler has joined #openstack-infra17:33
*** arezadr has quit IRC17:35
pleia2clarkb: looks like selinux already gave 8080 away: http_cache_port_t              tcp      3128, 8080, 8118, 8123, 10001-1001017:36
pleia2get a "/usr/sbin/semanage: Port tcp/8080 already defined" error when trying to set it again17:36
*** morganfainberg is now known as morganfainberg|a17:38
pleia2ah: semanage port -m -t http_port_t -p tcp 8080 (-m to modify, rather than -a to add port def)17:38
clarkbpleia2: I am going to brute force it to allow other potential ports. I think I can do this with the onlyif exec clause17:38
clarkbor I can use -m thanks17:38
Alex_Gaynor:/ we really need fewer failures in the gate pipeline17:39
mordredmgagne: you can't17:41
mordredmgagne: that's the reason I do the two passes with the star17:41
mgagnemordred: sad panda. Sad that you can't replicate the behaviour of the "Previously Reviewed By" section17:42
*** cthulhup has joined #openstack-infra17:42
mgagnemordred: is this section openstack specific?17:42
*** rnirmal has quit IRC17:43
*** SergeyLukjanov has joined #openstack-infra17:43
mordredmgagne: yes. we had to write java to get that17:44
mordredAlex_Gaynor: if anyone ever says that testing code as it's uploaded rather than doing the work we do to test as it would land is sufficient, they should watch our gate resets17:46
*** cthulhup has quit IRC17:47
Alex_Gaynormordred: Seriously. A decent portion of recents are from flaky tests or people approving patches before their jenkins run happens though. We really need to cut down on this17:47
mordredevery time something fails in the gate pipeline, it's a testament to just how complex this opentsack thing we're testing really is. oy17:47
Alex_Gaynors/this/those/17:47
Alex_Gaynoreach one costs us like an hour17:47
mordredAlex_Gaynor: yes. we really do17:47
mordredand _seriously_ ? people are approving in this climant before the check job finishes?17:47
mordreds/climant/climate/17:48
mtreinishAlex_Gaynor: https://review.openstack.org/#/c/41797/ that will drop that reset time down17:48
Alex_Gaynormaybe not today, but I've definitely seen it before17:48
mtreinishbut at the cost of a bit more flakiness17:48
Alex_Gaynormtreinish: only one way to find out!17:48
Alex_Gaynor(if it's worth it)17:48
*** afazekas has joined #openstack-infra17:48
Alex_Gaynormtreinish: we going to land that once check passes?17:48
mtreinishAlex_Gaynor: we can, but I wasn't planning on doing it until 2 race fixes get through the gate (we can just stack it on the end)17:49
mtreinishhere's the graphs I've been watching https://tinyurl.com/kmwsvob17:50
Alex_Gaynormtreinish: probably best to wait for those to be fully landed, given how the gate is right now :/17:50
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278417:51
mtreinishAlex_Gaynor: yeah, the other problem is those 2 reviews don't fix the 3 most common flaky parallel fails I've been seeing in the gate pipeline17:51
clarkbpleia2: ^ I think that should work. you can't -m an existing thing so I do -a and if that fails -m17:51
pleia2clarkb: "can't -m an non-existing thing" I think you mean, but yes, good call17:52
* pleia2 reviews17:53
clarkbpleia2: yah non-existing. I can type I swear17:53
clarkbwoot dependency cycle17:53
*** dina_belova has joined #openstack-infra17:54
pleia2clarkb: how are we handling git daemon's port?17:55
pleia2my patch?17:57
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278417:57
clarkbpleia2: ya17:57
pleia2ok cool17:58
clarkbwhich is working fine best I can tell17:58
*** ruhe has quit IRC17:58
mordredAlex_Gaynor, pleia2, clarkb can I get a read on this before I send it to the dev list?17:59
Alex_Gaynormordred: that's "this"?18:00
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278418:00
mordredhaha18:00
mordredhow about I paste the link18:00
pleia2:)18:00
mordredhttp://paste.openstack.org/show/44785/18:00
Alex_Gaynorwoudln't hurt :)18:00
mordredI want to be clear, not too bitchy or accusing, and also not indicate panic18:00
clarkbI am going to kill this dependency cycle darnit18:00
mordredclarkb: I believe in you18:01
pleia2mordred: looks good to me18:01
Alex_Gaynormordred: looks good to me18:01
mordredthanks18:01
jeblairdoes anyone want to become (even more of) a git expert?18:02
mordredjeblair: sure18:02
*** zehicle_at_dell has quit IRC18:02
jeblairi think we need to get a handle on the refs/changes issue18:02
*** AJaeger has quit IRC18:03
jeblairbecause a very simple test (cloning nova with and without refs/changes) is about a 2x difference in speed18:03
mordredas in, how that affects a remote update?18:03
jeblairbut it's _complicated_18:03
anteayamordred: I would reiterate your tl;dr before you sign off18:03
anteayajust in case they love your prose so much, they forget the point18:03
jeblairso i don't want a simple "oh, let's just not replicate refs/changes" before we _understand_ it18:03
mordredjeblair: can you give a summary of what's complicated?18:04
jeblairthings that may impact the issue are whether the refs are in the repo at all, whether they are there and packed, and whether our clients or servers are appropriately (not) advertising them on initial connect18:04
jeblairsee this thread: http://thread.gmane.org/gmane.comp.version-control.git/126797/focus=12705918:04
jeblairi don't know if that landed, or what18:05
jeblairanyway, i will get around to understanding that, but i don't want that to distract from our work on 'just add more mirrors of what we have' for now18:05
jeblairso if anyone makes some headway into that before we get to that optimization point, it would be useful18:06
*** fbo_away is now known as fbo18:06
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278418:06
mordredjeblair: I will read that and other things and see if I can drop some knowledge18:07
jeblairmordred: awesome, thx18:08
clarkbthat last patchset makes me really sad18:08
clarkbI am running bash in a puppet exec so that I can easily negate the return code of a command in the onlyif18:08
clarkbof course I probably forgot to update the path and it will fai18:08
*** rnirmal has joined #openstack-infra18:08
* anteaya hands clarkb an "l"18:09
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278418:09
*** datsun180b has quit IRC18:11
mordredjeblair: jumping to thoughts - what if we make our remote refspecs on the build slaves more specific18:12
*** cthulhup has joined #openstack-infra18:13
jeblairmordred: maybe; i'd want to understand what it's doing now though (what does git remote update do?  how does it relate to the (non-)advertisement of refs?)18:14
*** marun has quit IRC18:14
anteayamordred jeblair not sure if this helps or not, but in this post I did as an intro to git I posted the changes to refs and logs/refs as I went along: http://anteaya.info/blog/2013/02/26/the-structure-and-habits-of-git/18:14
clarkbjeblair: that latest patchset mostly works. I am not entirely convinced it will restart apache before attempting to start haproxy, but we can do multiple passes really quickly if we need it18:14
clarkbjeblair: its tricky to get that right because I kept running into dependency cycles.18:14
clarkbjeblair: but the 15g nodes is now running apache and git-daemon behind haproxy18:14
jeblairclarkb: you should be able to clone nova18:15
anteayabut I didn't create or track remote branches or refs, so I don't answer that question18:15
jeblair(the others don't exist)18:15
clarkbjeblair: ok testing18:15
clarkbjeblair: git clone git://162.209.12.127/openstack/nova works18:16
adalbashi! some jobs in the gate (looking at devstack-testr-vm-full) are showing this error ''ERROR:root:Could not find any typelib for GnomeKeyring'. Anyone noticed that and know what is this about?18:16
*** xBsd has joined #openstack-infra18:17
jeblairclarkb: http?  use  GIT_SSL_NO_VERIFY=true18:17
clarkbjeblair: and https is failing because the development hiera does not have the ssl cert18:17
anteayaadalbas: that is a bug18:17
jeblairclarkb: are you sure?  i thought dev hiera was prod hiera?18:17
clarkbjeblair: I don't think it is, but I will double check18:18
anteayait shouldn't affect the outcome of the tests adalbas18:18
adalbasanteaya, yeah, i realized that. Is there a bug opened for that anyway?18:18
clarkbjeblair: nevermind it is a symlink. I will look into this more closely18:18
anteayaadalbas: looking18:18
*** fbo is now known as fbo_away18:18
jeblairclarkb: it _should_ install the cert for git, which you should be able to ignore with that env var18:19
*** marun has joined #openstack-infra18:19
adalbasanteaya, i found this one: https://bugs.launchpad.net/devstack/+bug/119316418:19
uvirtbotLaunchpad bug 1193164 in devstack "GnomeKeyring errors when installing devstack" [Undecided,New]18:19
*** boris-42 has joined #openstack-infra18:19
clarkbjeblair: it isn't installing the cert at all so we can't ignore the error (I think apache is failing to do anything at that point)18:19
anteayaadalbas: that's the one18:20
adalbasanteaya, tks!18:20
anteayaadalbas: np18:20
clarkbjeblair: error does change when using the GIT_SSL_NO_VERIFY flag18:20
ttxjeblair: about mordred's suggestion of not approving before checks are run... is it something we could enforce ? I can see benefits for it even outside of the FF craze.18:21
*** xBsd has quit IRC18:21
jeblairttx: probably; occasionally it's useful.  worth thinking about18:22
Alex_Gaynorttx: So, FWIW when I first got involved in OpenStack, the way I thoguht it worked was that there wasn't an explicit "Approve" state, that instead stuff was approved when jenkins passed and it had the needed +2s. Such a model might be interesting to explore.18:22
clarkbjeblair: https://162.209.12.127/openstack/nova/info/refs not found falling back on the dumb client?18:23
mordredAlex_Gaynor: that's where we started, actually18:23
ttxAlex_Gaynor: we kinda want the APRV because smoetimes there is a timing constraint. So you can have two +2s but waiting for something to happen before hitting APRV18:23
*** AJaeger has joined #openstack-infra18:23
mordredthat too. but the effect on the gate would be largely the same if we triggered a gate run direcetly on the second +218:24
Alex_Gaynorhow many builders do we have right now for non-devstack builds?18:24
mordredwhich is that the second +2 could jump the initial vrfy and trigger the gate testing anyway18:24
ttxmordred: it wouldn't be completely insane to require that check tests pass before adding something to the gate queue. At least for some pipes18:25
jeblair(also, why would you never want more than 2 core reviewers to review something?)18:25
clarkboh I know. I need to put git.openstack.org in the request. /me edits /etc/hosts locally18:25
*** fbo_away is now known as fbo18:25
*** datsun180b has joined #openstack-infra18:25
*** xBsd has joined #openstack-infra18:27
*** woodspa has joined #openstack-infra18:28
jeblairclarkb: why?18:31
jeblairclarkb: (the other servers don't require that)18:31
clarkbjeblair: because the 4443 vhost is for git.openstack.org otherwise you get the default vhost18:32
jeblairclarkb: why don't we make the 4443 accept all hostnames?18:33
clarkbjeblair: we can do taht as well. Remove the default vhost and put a * in the git.openstack.org vhost18:33
clarkbbut now I appear to have haproxy logging issues. It wants to log to rsyslog via udp18:34
clarkberror: gnutls_handshake() failed: A TLS warning alert has been received. is the current error18:34
mordredjeblair: ok. I think I have learned new things18:41
reedbbl18:41
*** reed has quit IRC18:41
jeblairclarkb, mordred: https://etherpad.openstack.org/git-lb18:42
jeblairdinky benchmarks18:42
jeblairi think we should use 8g nodes instead of 30g; and lots of them.18:43
jeblairmordred: what have you learned?18:44
clarkbwow those numbers are very close to each other18:44
sdake_is the gate broken ?18:44
mordredjeblair: ah. nope.18:45
mordredjeblair: I did not learn something18:45
anteayasdake_: what do you see that you ask the question?18:46
anteayathe gate is very very slow but it should still be running18:46
jeblairmordred: (i have learned that git.o.o has a partial packed-refs file; i suspect it has something to do with how it was created (maybe an initial git clone --mirror or something))18:46
jeblairmordred: 28k refs are in packed refs, 9k are loose18:47
mordredjeblair: interesting18:47
jeblairmordred: review.o.o is all unpacked18:47
anteayathe check queue however is filled with unknown rather than a time18:47
mordredjeblair: I'm breaking down and asking spearce questions directly18:47
jeblairanteaya: waiting on centos nodes for py26 tests18:47
sdake_anteaya apparently heat gate jobs are going slowly18:47
sdake_but they appear to make progress according to devs in the heat channel - but thanks for responding18:47
anteayasdake_: yes all gate jobs are going slowly18:48
anteayayes absolutely18:48
jeblairwhich probably means we should add more centos nodes18:48
anteayajeblair: great thanks18:48
anteayago go centos nodes18:48
jeblairclarkb: they are actually close enough that i want to spin up a 4g and 2g node (they both have 2vcpus; half of 8g's 4vcpu)18:50
*** cthulhup has quit IRC18:51
clarkbjeblair: good idea18:52
clarkbI am going to stop using haproxy for the http to https rediect. I don't think that works tiwh the tcp mode18:53
mordredjeblair: best I can tel, the patch did not land, nor any patches like it18:53
*** danger_fo_away is now known as danger_fo18:54
jeblairmordred: :(18:54
clarkbI had a hunch this would be the case which is why I kept the 8080 vhost18:54
mordredjeblair: I'm continuing to dig though18:54
jeblairthose are launching now; i need to get exercise and lunch; should be back in about 1 hour18:54
*** sarob has joined #openstack-infra18:59
clarkbremoving the default virtualhost and matching * on the git.o.o vhost makes things work for some reason. I am not complaining patchset icoming19:04
mordredjeblair: uploadpack.hiderefs19:08
mordredjeblair: it's in 1.8.219:09
mordredwhich means we'd almost certainly want the fetch-from repos to be on precise so taht we could install latest git from the git ppa19:10
mordredclarkb: ^^19:10
clarkbmordred: ugh19:10
*** gordc has quit IRC19:10
clarkbI think we either get cgit on precise or new git on centos19:11
mordredoh yeah?19:11
clarkbbecause those seem less painful than a complicated proxy mess to send cgit to centos boxes and everything else to precise boxes19:11
mordrednod19:11
mordredhow awful is getting git >=1.8.2 on centos along side of our cgit install?19:12
mordredpleia2: ^^ ?19:12
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278419:13
pleia2mordred: pretty awful19:13
mordredSWEET19:13
pleia2would have to load up a 3rd party rpm, which makes me :(19:13
mordredwhere are we getting cgit from?19:14
pleia2epel19:14
mordredepel has cgit and not git >=1.8 ?19:14
pleia2as I understand it, epel is just "other stuff" not so much backports19:14
clarkbugh I just derped and started a git fetch of 42784 into the hiera repo...19:14
* clarkb makes a note to clean up that repo when this is all done19:15
clarkbhiera itself should be fine as I didn't check out anything in that repo19:15
mordredpleia2: I understood the oposite - that epel is backports of current fedora for old centos/rhel19:15
mordredbut - I REALLY don't understand19:15
*** AJaeger has quit IRC19:15
clarkbmordred: maybe you can take a look at that repo to make sure I didn't hose anything? second set of eyes and all that19:15
mordredclarkb: all you did was fetch?19:16
pleia2Does EPEL replace packages provided within Red Hat Enterprise Linux or layered products?19:16
pleia2No. EPEL is purely a complementary repository that provide add-on packages.19:16
clarkbmordred: yes19:16
*** zaro has joined #openstack-infra19:17
anteayahey zaro19:17
mordredok. then I do not have a good answer19:17
clarkbmordred: http://paste.openstack.org/show/44788/ I ^C'd before the checkout19:17
pleia2I mean, we can just use an rpm19:17
mordredwell, cgit is compiled against git19:18
pleia2oh, that19:18
mordredisnt' it? so wouldn't that screw the cgit install too?19:18
pleia2I'm not sure19:18
mordredor - wait - no, they do static linking19:18
mordredthat's why it's not in ubuntu19:18
pleia2er, hooray for static linking?19:18
pleia2:)19:18
mordredclarkb: yeah. you're fine19:18
pleia2I can find a nice looking rpm and install it on my test system19:19
clarkbmordred: ok, is that something we should git gc?19:19
mordredclarkb: not this week19:19
mordred:)19:19
*** beagles has quit IRC19:20
clarkbya I am not terribly worried about it, but I should probably clean that up at some point. I will write a note on the whiteboard19:20
clarkbjeblair: ^19:20
*** sarob has quit IRC19:21
*** pblaho has joined #openstack-infra19:21
*** sarob has joined #openstack-infra19:21
zaroanteaya: hello!19:22
anteayawelcome to the party19:23
anteayajeblair noticed that when restarting zuul a gearman thread dropped resulting in slaves sticking around and tests running on them, but they were orphaned19:23
anteayaso the logs from the tests wore lost19:24
anteaya<jeblair> mordred, clarkb, zaro: when the gearman server restarts, i think the executorworkerthread dies, which means the offline-on-complete feature fails19:25
anteaya* xBsd has quit (Quit: xBsd)19:25
anteaya<jeblair> mordred, clarkb, zaro: which is why a lot of jobs are showing up as lost right now -- they are re-running on hosts that should have been offlined19:25
anteaya* michchap (~michchap@60-242-111-85.tpgi.com.au) has joined #openstack-infra19:25
anteaya<jeblair> so for the moment, if we stop zuul, we need to delete all the slaves19:25
*** AJaeger has joined #openstack-infra19:25
*** AJaeger has joined #openstack-infra19:25
anteayazaro from about 4.5 hours ago19:25
*** beagles has joined #openstack-infra19:25
*** sarob has quit IRC19:25
clarkbok lunch time back shortly19:28
zaroanteaya: sorry i missed it all.  i was deep in gerrit.19:28
anteayaclarkb: happy lunch19:28
anteayazaro: understandable19:28
zaroanteaya: lunching with clarkb so will think about it after food.19:28
anteayazaro: happy food19:28
ttxanteaya: the "available test nodes" graph at bottom of zuul status page looks a bit funny. Since you've been following the action, is it considered normal ?19:29
pleia2tsk, RPMForge is popular but for centos they only have up to git 1.7.1119:29
ttx10 is the new 019:29
anteayattx: I asked the same question at the start of my day today19:30
anteayait means that we are using all available nodes, really19:31
anteayaso yes 10 is the new 019:31
anteayattx when I asked jeblair the same question this morning he responded with this image: http://graphite.openstack.org/render/?from=-24hours&fgcolor=000000&title=Test%20Nodes&_t=0.8664466904279092&height=308&bgcolor=ffffff&width=586&until=now&showTarget=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.ready%29%2C%20%27devstack-precise%27%29%2C%20%27green%27%29&_salt=1376751567.43&target=alias%28sum19:31
anteayaSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.building%29%2C%20%27Building%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.ready%29%2C%20%27Ready%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.used%29%2C%20%27Used%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.delete%29%2C%20%27Delete%27%29&areaMode=stacked19:31
anteayaoh goodness sorry about that19:32
anteayaick19:32
ttxok, was expecting something like this. "free node" graphs are always a bit funny in a dynamic allocation system19:32
anteayayes19:32
ttxanteaya: could you tinyurl that for me ?19:32
anteayattx: https://tinyurl.com/kmotmns19:33
anteayabetter19:33
mordredok. plane landing. I _may_ get on for a minute at the hotel tonight, but in general I'm switching to driving large trucks and building steel structures in the hot sun19:34
mordredand, you know, burning the man19:35
anteayahappy sand, mordred19:35
pleia2mordred: have fun! (or whatever you're supposed to have at burning man :))19:36
anteayawhatever it is, it doesn't include water or shade19:36
*** vipul is now known as vipul-away19:36
*** vipul-away is now known as vipul19:36
*** boris-42 has quit IRC19:37
ttxInteresting thing... FeatureProposalFreezes should overflow the checks, not the gate pipeline. FeatureFreeze will overflow the gate pipeline. That should really be fun19:38
ttx(i.e. people are supposed to propose stuff, not so much approve them)19:39
anteayawhen is the date for FeatureProposalFreezes?19:40
mgagneanteaya: August 21 for nova and cinder -> https://wiki.openstack.org/wiki/Havana_Release_Schedule19:41
mgagneanteaya: today =)19:41
anteayamgagne: thank you19:41
anteayaah ha19:42
anteayafunny it has been neutron and heat we have heard from today19:42
anteayacinder and nova have been relatively quiet in this channel19:43
*** melwitt has joined #openstack-infra19:43
pleia2clarkb: so the only reasonable, new git rpms that people use are from http://pkgs.repoforge.org/git/ (might find some random ones on some-person's-blog if I search more, but I haven't yet, and even then...), repoforge only goes up to 7.11, the other option is installing from source :\19:44
pleia2er, 1.7.1119:44
*** pblaho has quit IRC19:45
* pleia2 lunch19:46
anteayahappy lunch19:46
anteayaguess it is just me right now19:46
anteayattx are stackforge projects affected by feature freeze? like savanah and murano?19:47
ttxno, only the integrated projects19:47
ttxi.e. the ones that do a common release19:47
*** arezadr has joined #openstack-infra19:48
anteayado you think there would be offence taken if stackforge projects were asked to submit patches on a critical basis only right now?19:49
anteayathen if something is non-critical it could wait until after the rush19:49
ttxthat's not really the concept that was sold to them, and unfortunately we are far from hte activity peak19:50
ttxie. Feature Freeze is actually two weeks away.19:50
ttxWe can't ask them to hold for two weeks.19:51
anteayaI'm seeing a lot of nova/heat/cinder/neutron patches so that is as expected19:51
jeblairanteaya: out of the 200 changes in zuul, ~40 are stackforge, and they run simple/fast jobs.  i don't think it's worth it.19:51
anteayattx fair enough19:51
anteayajeblair: ah stats thank you19:51
*** vipul is now known as vipul-away19:51
anteayathe question just floated through my head so I thought I would give it voice19:51
anteayajeblair: mordred found a git fix but it requires git 1.8.2 which requires installing a third party rpm for cgit and even then it appears the package is not available19:52
*** wenlock has joined #openstack-infra19:52
wenlockhi all19:53
*** vipul-away is now known as vipul19:53
wenlockquestion about hiera config, was looking for a sample... finally got some time to dig back into this today19:54
jeblairanteaya: i saw19:54
*** thomasbiege has joined #openstack-infra19:55
anteayak19:55
anteayawenlock: hello what is the question?19:55
wenlockmaybe enough to get me started with wiki19:55
*** thomasbiege has quit IRC19:57
*** cyeoh has quit IRC19:57
*** chuckieb|2 has quit IRC19:58
*** koobs` has joined #openstack-infra19:58
jeblairit actually merged 11 changes in the past hour; i think it just got 11 more added to the gate queue.19:58
*** cyeoh has joined #openstack-infra19:58
*** koobs has quit IRC19:58
*** jhesketh has quit IRC19:59
anteayahow does 11 merges in the last hour compare with prior hours?19:59
anteayaare we getting better or staying the same?19:59
*** jhesketh has joined #openstack-infra20:00
jeblairanteaya: we haven't done anything to make it better yet so it's not worth looking.  i mostly wanted to see if it was functioning at all, and it is.  so it's back to scaling git.o.o now.20:01
anteayaah okay20:01
jeblairanteaya: (it's in graphite if you wanted to play with it; i don't have a link, i was grepping logs because i was looking for errors)20:02
*** cthulhup has joined #openstack-infra20:02
anteayajeblair: I have forgotten how I get to graphite20:02
jeblairanteaya: graphite.openstack.org20:03
anteayathat would be it, thanks20:03
*** thomasbiege has joined #openstack-infra20:03
*** linggao has joined #openstack-infra20:03
jeblairclarkb: we should consider using the private interfaces for git haproxy (but that only works within a DC; and we should also test to see which is actually faster)20:04
*** mrodden has quit IRC20:05
*** hartsocks has joined #openstack-infra20:05
*** thomasbiege has quit IRC20:06
*** cthulhup has quit IRC20:06
linggaoHi clarkb, I accidently added a patch 10 to someone's code in review. I meant only depend on his code.20:09
linggaoclarkb, how do I remove patch 10 in https://review.openstack.org/#/c/40844/ ?20:09
linggaoclarkb, NobodyCam told me to ask you about it.20:10
jeblair#status ok20:11
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure | docs http://ci.openstack.org | bugs https://launchpad.net/openstack-ci/+milestone/grizzly | https://github.com/openstack-infra/config"20:11
*** afazekas has quit IRC20:13
*** yolanda has joined #openstack-infra20:13
*** rnirmal has quit IRC20:18
clarkbjeblair: did you catch my hiera data repo derp in scrollback? don't let me forget to clean that up at some point when things are quieter20:18
clarkbjeblair: tl;dr I fetched a ref from openstack-infra/config into that repo http://paste.openstack.org/show/44788/ because I was in the wrong PWD when running that command20:19
clarkblinggao: there is not way to remove patch10. You can only push a patch 11 that restores patchset 920:19
jeblairclarkb: yep20:19
clarkbjeblair: I am reading up on private interfaces now. 162.209.12.127 has the latest patchset of my change applied to it and is working fine20:20
*** morganfainberg|a is now known as morganfainberg20:20
linggaoclarkb: thanks. I'll do that to repaire the damage.20:20
*** dkliban has quit IRC20:20
clarkbjeblair: oh you mean the rax private interfaces20:20
jeblairclarkb: yeah20:20
clarkbjeblair: do our firewall rules apply to both interfaces? if so it is just a matter of putting those IPs into the balance member IP list20:21
jeblairclarkb: well, we need to decide if we want to use them first20:22
jeblairclarkb: you want to run a quick benchmark beetween the 15 and 30 g test nodes i set up?20:22
jeblairclarkb: (note, they are in ORD, not DFW)20:22
clarkbjeblair: I will set up 162.209.12.127 to balance across that node and the 30G node on their private interfaces then switch to public20:22
jeblairclarkb: ok, i was just thinking do a quick git clone from one to the other to see if you notice a diff20:24
clarkbjeblair: without haproxy?20:24
clarkbI can do that too20:24
jeblairclarkb: updated https://etherpad.openstack.org/git-lb20:26
jeblair2g has the highest 'clients served per gb' ratio20:26
*** hashar has joined #openstack-infra20:28
jeblairand overall it correlates very closely to 1/1 client/cpu  (with 8g being able to serve 1.5 clients per cpu)20:28
*** hashar has left #openstack-infra20:28
jeblair(but slowly)20:28
Alex_Gaynorevent/result queues on zuul seem to be rising20:30
jeblairAlex_Gaynor: thanks, i'll take a look20:30
*** rfolco has quit IRC20:31
zarojeblair: i can't seem to repro executorworkers stopping when gearman server restarts.  are you still seeing this?20:34
*** yolanda has quit IRC20:34
jeblairzaro: the problem is that the node was not taken offline20:34
jeblairzaro: the rest is speculation20:35
*** hartsocks has left #openstack-infra20:35
zarojeblair: node not taken offline due to restarting gearman server?20:35
odyssey4me4join #chef20:35
odyssey4me4hahaha, oops20:35
jeblairzaro: when the gearman server was taken offline, nodes that were running jobs were not set offline20:36
*** p5ntangle has joined #openstack-infra20:36
zarojeblair: ahh, ok.20:36
*** rnirmal has joined #openstack-infra20:38
*** UtahDave has joined #openstack-infra20:38
jeblairclarkb: i'm going to try your signal patch now, but i expect it to kill the gearman server20:41
jeblairclarkb: which means we'll get one thread dump and then we get to restart zuul20:42
clarkbjeblair: ok20:42
clarkbalso :(20:42
*** p5ntangle has quit IRC20:44
anteayajeblair: should we have a status update for the channels?20:44
clarkbanteaya: I think we can do that once the gearman server falls over20:44
clarkbanteaya: if we don't recover cleanly20:44
anteayaokay20:44
*** AJaeger has quit IRC20:46
*** odyssey4me4 has quit IRC20:46
jeblairclarkb: i don't think it fell over20:47
clarkbjeblair: did you get a stack dump?20:48
jeblairclarkb: i have 2 of them, so far; slightly different, and useful20:48
clarkbjeblair: best I can tell there isn't a real difference betwene private or public interfaces on those boxes20:49
clarkbI updated the etherpad20:49
*** thomasbiege has joined #openstack-infra20:49
*** thomasbiege has quit IRC20:50
jeblairclarkb: i'm switching to intense zuul hacking; i'd lean toward going with public and proceeding with the plan20:50
clarkbok, I am going to test ipv6 now. as I noticed that wasn't working20:51
clarkband git.o.o has a AAAA record so it should be made to work20:51
jeblairclarkb: but we can haproxy over v4, yeah?20:51
clarkbjeblair: yeah this is just for the frontend listen directives20:51
jeblairclarkb: (i mean, if somethings broke, probably worth looking into)20:51
jeblairok, yeh20:51
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.  https://review.openstack.org/4278420:52
*** apcruz has quit IRC20:54
*** lbragstad has quit IRC20:55
jeblairclarkb: ok, i think i have what i need to hack on the zuul problem20:56
jeblairclarkb: i don't believe it's going to get better (it might, after a long time, increment through the loop again)20:56
jeblairclarkb: so we should go ahead and stop it, which as we learned, means some cleanup work.20:56
jeblairclarkb: up for helping?20:56
clarkbjeblair: sure20:57
*** dkliban has joined #openstack-infra20:57
clarkbjeblair: I think 42784 is just about ready. Need to test thta that works over ipv6 now but git clone doesn't like ipv6 addresses in its url20:57
clarkbit splits on ':'20:57
jeblairah20:57
clarkb*splits on ':' and treats the righ hand side as the port20:57
jeblairclarkb: will you log into jenkins02?20:57
clarkbjeblair: I am in20:58
jeblairclarkb: abort all jobs.  :)20:58
clarkboh you mean through web ui?20:58
jeblairclarkb: yep20:58
clarkbI ssh'd in >_>20:58
clarkbjeblair: should I put it in shutdown mode first to prevent more jobs from starting?20:58
jeblairclarkb: i stopped zuul20:59
clarkbok20:59
clarkbI am aborting jobs now20:59
jeblairclarkb: with any luck, nodepool should clean most of those up21:01
*** cody-somerville has quit IRC21:01
clarkbI am not sure if I should wait after clicking the red button or if I can just spam that. I assume that it is just making a rest call back to jenkins21:02
jeblairclarkb: spam it21:02
*** fbo is now known as fbo_away21:02
jeblairclarkb: when you're done; double check that of all the on-line devstack nodes, none of them has a build history21:03
clarkbjeblair: FYI https://jenkins02.openstack.org/job/gate-neutron-python26/275/ won't die and has been running for hours21:05
jeblairclarkb: i'll look into it21:06
clarkbI am waiting for nodepool to cleanup the nodes now21:06
jeblairclarkb: (it'll add them too, you should end up with 10 online and 5 offline nodes [thanks to az2])21:07
jeblairclarkb: that's nasty; i think we should restart that jenkins master21:08
jeblair(when nodepool finishes21:08
*** dkranz has joined #openstack-infra21:08
jeblairclarkb: (i killed and relaunched the slave and that  build is still stuck)21:08
clarkbjeblair: restarting the master wfm21:09
clarkbjeblair: I assume we will wait for nodepool to settle first21:09
*** dprince has quit IRC21:09
jeblairi think we're there...21:09
clarkbyup I am checking build history now21:09
jeblairclarkb: k; you can restart it at will21:10
jeblair#status alert Restarting zuul, changes should be automatically re-enqueued21:10
openstackstatusNOTICE: Restarting zuul, changes should be automatically re-enqueued21:10
*** ChanServ changes topic to "Restarting zuul, changes should be automatically re-enqueued"21:11
clarkbjeblair: build history is all empty. restarting jenkins now21:11
*** mrodden has joined #openstack-infra21:12
jeblairclarkb: ready for me to start zuul?21:13
clarkbjeblair: ya, jenkins is back up21:13
jeblairzuul is up; i've started the reverifies and rechecks (with a 30s delay as earlier)21:14
jeblairthough perhaps i should have done 60s, knowing what we know about git.o.o now21:14
clarkbI just ran into the cannot fetch idx thing cloning from the 15g test node on the 30g test node...21:15
clarkbthis was over ipv621:15
clarkbthrough haproxy21:15
*** cody-somerville has joined #openstack-infra21:15
clarkbare we not able to pack up all of the refs before the http timeout?21:15
jeblairi have not seen that in isolated testing21:16
clarkbyou know, I wonder if the centos git slowness has anything to do with ipv621:16
*** dina_belova has quit IRC21:16
clarkbbecause it is being really slow too21:16
*** eharney has joined #openstack-infra21:16
clarkbcloning with git:// over ipv6 worked fine21:17
jeblairi'm going to switch to zuul hacking to try to squash this bug before the next time we have to restart it21:17
clarkbok21:17
clarkbpleia2: are you around?21:17
clarkbpleia2: any chance you can try and corroborate that git cloning on centos is slow when using ipv6 but not when using ipv4?21:18
pleia2clarkb: hey21:18
clarkbpleia2: cloning against review.o.o should be sufficient to test that21:18
pleia2clarkb: ok, will do21:18
*** gordc has joined #openstack-infra21:19
clarkbthank you21:20
clarkbpleia2: fwiw it is consistent on these test boxes21:20
clarkbI am testing ipv4 again to make sure it isn't some other external thing being weird21:21
pleia2clarkb: wait, running git clone *on* centos or to a git server on centos?21:23
*** dkranz has quit IRC21:23
clarkbpleia2: git clone on centos21:24
clarkbpleia2: as our centos slaves are slow cloning from review.o.o21:24
pleia2clarkb: only have an hpcloud account, no ipv621:24
clarkboh21:24
*** pabelanger has quit IRC21:24
clarkbI am seeing the same slowness again ipv4 now. I am going to test cloning from my local box now21:24
pleia2I have several hosts that do have ipv6, but all debian and ubuntu21:25
*** reed has joined #openstack-infra21:25
clarkbwow this is so weird. On the rax test centos box ipv4 clone timed out too then did the cannot find idx pack file thing21:26
clarkbbut I run the same clone on my local precise box and clone all of nova in ~45 seconds21:27
clarkbgit:// works just fine on centos tough21:27
*** xBsd has quit IRC21:28
jeblairclarkb: remember that i was able to do 6 simultaneous clones over v4 https to the 8g box21:28
jeblair(without error)21:28
clarkbI am going to bypass haproxy now to see if that is tickling the issue21:28
clarkbjeblair: were you running the clones on cenots?21:28
jeblairclarkb: no, on precise21:29
clarkbjeblair: I think this is the centos slowness remanifesting itself21:29
jeblairinteresting21:29
clarkbbecause my precise box is fine21:29
jeblairclarkb: er, is the issue that centos git does not speak the smart http protocol?21:30
clarkbjeblair: no, centos git does speak smart http protocol21:30
*** fbo_away is now known as fbo21:30
clarkbit is 1.7.1 iirc and smart http went in 1.6.something21:30
clarkbI will double check thatthough21:30
jeblairmaybe it doesn't speak it well.21:30
clarkbcould be21:30
*** dkranz has joined #openstack-infra21:31
*** vipul is now known as vipul-away21:32
pleia2it takes over 2 minutes to clone nova over http review.o.o in a couple places I tested (a debian linode - ipv4&6 and centos hpcloud ipv4)21:34
clarkbjeblair: I can clone directly over ipv4 to apache. It is slow, but it works. I think haproxy must be amplifying some latency21:35
clarkbI am trying to test with ipv6 but our iptables puppet stuff doesn't work correctly on centos for ipv621:35
*** lbragstad has joined #openstack-infra21:35
*** vipul-away is now known as vipul21:35
*** mriedem has quit IRC21:37
*** danger_fo is now known as danger_fo_away21:38
*** dkranz has quit IRC21:43
*** SergeyLukjanov has quit IRC21:45
*** lcestari has quit IRC21:47
clarkbdirect ipv6 is also slow but works eventually21:48
clarkbpleia2: where are those newer versions of git? I am half tempted to try one of them to see if the slowness goes away21:48
pleia2clarkb: debian is 1.7.1021:49
clarkbpleia2: the ones you found for centos21:49
pleia2clarkb: ah, newest one for centos is 1.7.1121:49
anteayaI'm on 1.8.1.2 - ubuntu quantal21:49
*** changbl has quit IRC21:49
jeblairclarkb: does it matter?  i mean, is that the way we're going to solve this?=21:49
anteayanot sure if that helps or creates jealousy21:49
pleia2git-daemon is a separate package though, realized I'd need to find a package for that too if it's not included21:50
jeblair(i mean, maybe it'll tell you something, but if the end result of this is 'it might work if we upgrade all the slaves' i think we're digging a bigger hole)21:51
*** fbo is now known as fbo_away21:52
* anteaya heads out for a walk21:54
*** dkranz has joined #openstack-infra21:55
pleia2oh, http://repoforge.org/ is the place for them though21:55
jeblairclarkb: ^21:55
pleia2(I tend to agree with jeblair though)21:55
pleia2usage details on centos6: http://wiki.centos.org/AdditionalResources/Repositories/RPMForge#head-f0c3ecee3dbb407e4eed79a56ec0ae92d1398e0121:56
*** linggao has quit IRC21:56
*** dkliban has quit IRC21:58
*** ^d has quit IRC21:59
*** hashar has joined #openstack-infra22:00
*** hashar has left #openstack-infra22:00
clarkbjeblair: I agree in general too. I am scanning git release notes there are a few things that pop out as possibly being hte cause22:04
*** burt has quit IRC22:04
*** mrodden has quit IRC22:04
*** ryanpetrello has quit IRC22:05
*** dkranz has quit IRC22:06
clarkbjeblair: pleia2 the two items with HTTP in them at https://git.kernel.org/cgit/git/git.git/tree/Documentation/RelNotes/1.7.5.txt seem like possible culprits22:09
*** _TheDodd_ has quit IRC22:11
*** dmakogon_ has left #openstack-infra22:15
pleia2clarkb: first seems like it would be trivial for big clones, not so sure about 2nd, they mention many tags but not big repo (nova does have a fair number of tags, I dont know how many "many" is)22:15
clarkbpleia2: neither do I. I am tcpdumping now and will have to look at this closer22:15
jeblairi don't expect the second to affect a clone22:15
clarkbbecause this will need to be sorted before we can point any of the centos slaves at haproxy22:16
*** dina_belova has joined #openstack-infra22:17
jeblairi think zuul is stuck again; but i haven't been able to repro the problem locally yet22:19
*** dina_belova has quit IRC22:21
*** mrodden has joined #openstack-infra22:23
jeblairi'm going to restart it with some cowboy logging22:26
*** dina_belova has joined #openstack-infra22:27
*** dina_belova has quit IRC22:32
clarkbok22:33
*** prad_ has quit IRC22:33
*** rcleere has quit IRC22:37
*** sarob has joined #openstack-infra22:38
clarkbcomparing tcpdump taken locally and tcpdump on centos the centos client ends up with a window size of 0 frequently which does not happen locally22:41
clarkbI think the client is unable to accept more data for some reason22:42
jeblairis that with haproxy in both cases?22:43
clarkbjeblair: without in the centos case. with locally I should retry locally without haproxy22:45
clarkbI should learn to use punctuation too22:46
anteayaback22:46
*** ftcjeff has quit IRC22:51
clarkbit is related to https somehow22:55
clarkbI gave the http vhost the same git stuff as the https vhost and it is much faster22:55
jeblairclarkb: haproxy?22:55
*** dims has quit IRC22:58
clarkbjeblair: https is slow through haproxy and not through haproxy, but worse through haproxy so definitely possible23:01
clarkbhttp seemed to be much faster through both23:01
*** woodspa has quit IRC23:01
jeblairclarkb: wasn't suggesting a cause, just trying to understand the variables in your experiment23:01
clarkbhttp + haproxy = fast, http = fast, https = slow, but does not fail, https + haproxy = even slower causes git clone to fail23:05
*** rnirmal has quit IRC23:05
clarkband git clone fails because it cannot get the idx files. Looks like same issue before where it falls back on non smart http and the lack of .git in the dir name breaks it23:06
*** senk has quit IRC23:08
jeblairright, so that first connection fails23:08
jeblairon the refs front, i believe the refs on git.o.o are as packed as they are going to be; the items in refs/ are all _directories_ (up to the last component of the path); the refs themselves are in a packed-refs file.23:09
*** dims has joined #openstack-infra23:11
*** mberwanger has joined #openstack-infra23:13
mgagneclarkb: cgit config remove-suffix = 1. This will allow repositories on the filesystem to have a .git suffix but still show without it in the interface and in generated URL.23:15
jeblairmgagne: i don't think cgit was the problem23:15
jeblairmgagne: the problem was that the rewrite rules needed to support the dumb http protocol don't work23:16
jeblairmgagne: but since we're never supposed to use the dumb http protocol, we're not worrying about fixing it for now (instead, making it reliable enough that the smart http protocol doesn't fail)23:17
mgagnejeblair: that's what I'm trying to find out and stumble on this config23:17
*** jjmb has joined #openstack-infra23:18
*** jhesketh has quit IRC23:18
*** jhesketh has joined #openstack-infra23:20
*** mberwanger has quit IRC23:21
* HenryG wonders if jeblair has a created a script to automate "recheck|reverify no bug" submissions. :D23:24
*** datsun180b has quit IRC23:27
*** dina_belova has joined #openstack-infra23:28
*** pabelanger has joined #openstack-infra23:28
*** eharney has quit IRC23:30
sarobwhats the pipeline ETA?23:31
Alex_Gaynorhmm, so there's stuff that has all it's builds complete, but is still hanging out at the top of the pipeline, is that because of git?23:32
*** dina_belova has quit IRC23:32
*** gordc has quit IRC23:33
anteayasarob: no ETA23:35
anteayaif it makes it through we are happy23:35
anteayawe have had to restart zuul three times today23:35
anteayanot our best day23:35
jeblairAlex_Gaynor: no, i believe we are reliably reproducing the zuul bug now23:37
Alex_Gaynorjeblair: ah!23:37
anteayajeblair: yay, did you cowboy logging result in more bug information?23:38
anteayas/you/your23:38
*** jjmb has quit IRC23:38
jeblairanteaya: a bit too much, i'm afraid23:38
jeblairi've stopped zuul again23:39
anteaya:(23:39
jeblairit will take me a few mins to process this and figure out what's going on23:40
*** jcooley has quit IRC23:41
anteayak23:42
anteayawell that seems like time well spent because this zuul bug keeps showing up23:43
*** jcooley has joined #openstack-infra23:44
*** zul has quit IRC23:47
sarobno sweat guys. you do an awesome job of keeping stuff humming23:52
anteayathanks sarob, jeblair may have found our elusive zuul bug23:52
sarobsweet. good luck.23:52
anteayaso hopefully we can keep zuul up longer on the next restart23:53
anteayathanks23:53
anteaya:D23:53
*** zul has joined #openstack-infra23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!