Thursday, 2013-08-22

*** sarob_ has joined #openstack-infra00:01
jeblairthat didn't quite tell me what i needed; restarted with more logging00:01
*** nati_uen_ has joined #openstack-infra00:02
anteayayay restart00:02
*** sarob has quit IRC00:04
*** nati_ueno has quit IRC00:05
*** michchap has joined #openstack-infra00:05
*** weshay has quit IRC00:05
clarkbjeblair: I feel like trying to debug this git thing is taking too much time. Everything works but https clone from centos clients00:06
clarkbjeblair: we can either make http available too, clone from /cgit, or use git:// on centos nodes00:07
clarkbcloning from /cgit appears to use the non smart protocol00:07
clarkbI am going to see if pcrews is still in the office to see if he has any ideas00:08
pcrewsclarkb: /me is at home today :)00:08
jeblairclarkb: ok.  let's make http available as a backup but go back to git://00:08
jeblairclarkb: (with enough capacity to handle it this time)00:08
*** UtahDave has quit IRC00:09
clarkbjeblair: sounds good. I will push one more patchset to enable http then we should be ready to start spinning up nodes00:11
clarkbpcrews: git clone https://foo through haproxy takes a really long time then fails. doing the same clone to the backend server takes a really long time but does not fail00:12
clarkbpcrews: wondering what might cause that and it is only when using https not http and only on centos00:12
pcrews? not a clue00:14
*** senk has joined #openstack-infra00:16
clarkbjeblair: do you want to try doing this tonight?00:17
clarkbfwiw /cgit isn't that much slower. about 2 minutes to clone over https00:18
jeblairclarkb: up to you; i will need to continue to focus on zuul tonight00:18
jeblairclarkb: but what is git:// ?00:18
clarkbjeblair: 34 seconds or so00:18
jeblairclarkb: that's kinda why i was thinking we should use it, but have http as a backup00:19
clarkbjeblair: I think I would prefer at least one extra set of eyes when we make the switch as there are a lot of moving parts00:19
anteayaclarkb: who do you have in mind? I'm no help there00:20
clarkbanteaya: jeblair :)00:20
*** wenlock has quit IRC00:20
jeblairclarkb: how many servers do you want, and what size?00:20
anteayaclarkb: ah okay, sorry thought you were talking about a 3rd person00:20
clarkbjeblair: I think you have a better feel for that than I do00:21
jeblairaha, i think i found the zuul problem00:21
clarkbjeblair: one additional thing to keep in mind with lots of small servers is the extra work gerrit will need to do replicating. Not sure if that is a big deal00:22
comstudbtw, i appreciate the work all of you are doing... despite all of the cursing that I'm doing. :)00:22
jeblairclarkb: good point; maybe we should go with several 8g servers then?00:22
anteayaI vote we let jeblair try to patch zuul first00:22
anteayathanks comstud00:22
anteayaI think we are doing our share of cursing too00:23
comstudi figure so00:23
clarkbjeblair: that sounds reasonable00:23
clarkbjeblair: start with ~4 then we can add more if needed?00:23
*** alexpilotti has quit IRC00:23
jeblairclarkb: yeah00:24
clarkblifeless: does the haproxy source balance type completely break if you sources are in the same /24 subnet?00:24
clarkblifeless: I am slightly worried that replication delays will cause problems with git if it hits 5 different servers at once (which by default it can do that)00:25
*** jjmb has joined #openstack-infra00:25
clarkbat least with the http protocol. I think git:// is one connection00:25
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.
lifelessclarkb: git is one connection yes00:28
clarkbjeblair: ^ that makes http a viable fallback option00:28
*** dina_belova has joined #openstack-infra00:28
lifelessclarkb: multiple http requests can go to different servers00:28
anteaya10 lovely patches in the gate, currently passing tests ha ha ha ha *flash of lightening*00:28
clarkblifeless: typically, however we will be replicated to 5 different servers and potentially different rates00:28
*** changbl has joined #openstack-infra00:28
anteayathe 11th one has a LOST00:28
lifelessclarkb: if the first server hit's its max_queue, goes down00:29
lifelessclarkb: yes, I get the case ;)00:29
lifelessclarkb: have you read the docs for source - when you add servers, you'll shuffle 1/new-num-servers of the http clients onto new servers00:29
clarkblifeless: using the default round robin it dynamically weighs them00:30
lifelessclarkb: right00:30
clarkboh looks like that would happen with source as there is a division of the hash00:31
lifelessclarkb: there isn't a mode where you can avoid http requests going to different servers; you only get to choose whether it happens all the time or when you have servers going down/up/added.00:31
clarkbI think I am less worried about that case and more generally worried about it when everything is going smoothly and gerrit happens to update one server more slowlythan the others00:31
lifelessare you running git-http-backend, or plain-ol-HTTP ?00:32
harlowjaqq, is there going to be an update to say the mailing list when jenkins is ok again?00:32
*** dina_belova has quit IRC00:33
anteayaharlowja: you will hear the party happening when jenkins is okay again00:33
harlowjasounds great :)00:33
anteayaand yes, we can do an update to the ml too00:33
harlowjathx for your guys hardwork00:33
harlowja*and gals00:33
anteayathanks harlowja00:34
lifelessclarkb: so, why do you want HTTP ?00:34
lifelessclarkb: you should read
lifelessthats why the http git CDN terrifies me :)00:36
*** chmouel has quit IRC00:36
jeblairstopped zuul again; it hit the bug and i have more logs00:36
*** westmaas has quit IRC00:36
*** westmaas has joined #openstack-infra00:36
*** chmouel has joined #openstack-infra00:37
*** GheRivero has quit IRC00:37
*** dtroyer has quit IRC00:37
*** juice has quit IRC00:37
*** GheRivero has joined #openstack-infra00:37
anteayajeblair: yay00:37
*** dtroyer has joined #openstack-infra00:37
anteayalet's hope the secret is in the logs00:37
*** jpeeler has quit IRC00:37
*** juice has joined #openstack-infra00:37
*** jpeeler has joined #openstack-infra00:38
*** jjmb1 has joined #openstack-infra00:39
clarkblifeless: for a couple reasons. 1. Apache is generally good about helping us not shoot ourselves in the foot unlike git daemon 2. $RANDOM people can usually hit port 443 3. with https you a reasonable amount of trust of who the remote end is00:39
*** jjmb has quit IRC00:40
clarkblifeless: I think we are getting better at item 1 with haproxy but items 2 and 3 aren't really solved with git daemon + haproxy00:40
*** melwitt has quit IRC00:40
clarkb2 and 3 aren't really gate issues00:42
clarkbjeblair: I have ten nova git clones over git protocol in a while true loop on the 30g host cloning from the 15g host through haproxy00:44
lifelessclarkb: might be an interesting read when you have time00:45
lifelessclarkb: so I'm reasonably sure smart http will still do multiple requests.00:46
lifelessclarkb: smart https will be totally fine.m it's only http that will suck.00:46
lifelessclarkb: my suggestion, use roundrobin, but https and git ports only00:46
clarkblifeless: how is https different than http in this scenario?00:47
lifelessclarkb: [and https in tcp mode so you just get a tunnel]00:47
mgagnelifeless: where are their puppet manifests =)00:48
lifelessclarkb: I'm fairly sure git will use one tcp connection for https00:48
lifelessclarkb: because everyone knows how slow https handshakes are00:48
clarkblifeless: I wonder if that could be why it fails so hard on centos00:49
lifelessclarkb: and intermediaries can't mess you up, whereas for http there are intercepting proxies all over the damn place00:49
lifelessclarkb: https intercepting proxies are rarer00:49
clarkbthere is a long delay when starting an https clone. At first I thought it may be related to handshaking but cloning from /cgit over https does not have the same delay and they share the same ssl setup00:50
*** sarob_ has quit IRC00:51
clarkband tcpdump showed the client reporting a zero window size frequently00:52
*** jhesketh has quit IRC00:53
*** jhesketh has joined #openstack-infra00:53
*** nati_uen_ has quit IRC00:55
clarkbjeblair: I have taken the load on fake git.o.o up to 23ish and haven't had any clones fall over yet00:55
*** nati_ueno has joined #openstack-infra00:57
openstackgerritbenley proposed a change to openstack-infra/jenkins-job-builder: Add display-name job property.
openstackgerritAngus Salkeld proposed a change to openstack/requirements: Add some more filters to the .gitignore
openstackgerritAngus Salkeld proposed a change to openstack/requirements: Bump python-ceilometerclient to 1.0.3
jeblair#status alert Zuul is offline for troubleshooting01:02
openstackstatusNOTICE: Zuul is offline for troubleshooting01:02
*** ChanServ changes topic to "Zuul is offline for troubleshooting"01:02
clarkbjeblair: I am going to grab dinner soon01:05
jeblairclarkb: k01:05
*** reed has quit IRC01:06
clarkbjeblair: we will need to spin up those 4 nodes tomorrow then write puppet changes to replicate to them and add them to haproxy then we can put everything in01:06
clarkboh and changes to update the clone urls01:06
jeblairclarkb: are you happy with the haproxy config?01:06
*** nati_ueno has quit IRC01:07
clarkbjeblair: I think so. I hammered the git:// relatively hard with some for loops01:07
*** markmcclain has quit IRC01:07
*** fifieldt_ has joined #openstack-infra01:20
*** UtahDave has joined #openstack-infra01:20
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Make updateChange actually update the change
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Add some log lines
jeblairlet's hope that's it.01:23
jeblairi will install that and restart zuul01:23
* clarkb reviews really quickly01:25
jeblairclarkb: i will wait to start until you have reviewed01:25
jeblairclarkb: specifically it was the 'needed_by_changes that was the problem here01:25
jeblairand a patch series of like 20 changes01:26
clarkbok interesting thing about the list of files01:26
clarkbjeblair: yup lgtm. nice catch01:27
jeblairi just did that because it looked like it could be wrong too (just keep appending files)01:27
clarkbya I agree01:27
clarkbok running off to find dinner and fungi01:27
jeblairok starting zuul01:27
*** dina_belova has joined #openstack-infra01:28
*** dina_belova has quit IRC01:33
*** gyee has quit IRC01:35
openstackgerritMathieu Gagné proposed a change to openstack-infra/config: Add commit-filter for cgit
*** huangtianhua has joined #openstack-infra01:37
jeblair#status ok Zuul is running again01:38
openstackstatusNOTICE: Zuul is running again01:38
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure | docs | bugs |"01:38
anteayalet's see what happens now01:38
anteayaway to go jeblair!01:39
jeblairmay want to wait about an hour before you cheer :)01:39
anteayaI'll cheer again in an hour01:40
*** mriedem has joined #openstack-infra01:40
anteayahopefully soon I will learn enough to help you01:40
*** dhellmann is now known as dhellmann_01:48
*** xchu has joined #openstack-infra01:52
*** yaguang has joined #openstack-infra02:02
*** roaet has joined #openstack-infra02:17
roaetAlright. Read through that scroll back. I'll try to pay attention now. :)02:17
anteayaroaet: welcome02:19
anteaya66 in the gate, 25 in the check02:20
anteayaany free nodes are going to check now, the gate queue seems to be loaded02:20
anteayait took almost an hour to load those 91 patches02:21
anteayaand I expect there are at least 90 check patches to come into the check queue so I would look for your patch in the list in about an hour roaet02:22
roaetanteaya: thanks. I will do so. trying to wrap my mind around all the information.02:22
anteayait is a lot02:22
anteayaI suggest take a small piece02:22
anteayaif you know jenkins plugins already, start there02:22
anteayaask questions, don't worry how dumb02:23
anteayaI'll do my best to answer or help you find the answer02:23
roaetThanks. Look forward to working with you all.02:23
*** DennyZhang has joined #openstack-infra02:23
anteayathanks looking forward to working with you too02:23
anteayawhat time zone are you in?02:24
anteayaI'm in Eastern02:24
jeblairanteaya: i don't have a full list of everything that needs to be rechecked/reverified; only the list from when i stopped it the first time02:25
anteayajeblair: great02:25
jeblairanteaya: i'm slowly leaving recheck comments on those, to avoid the thundering herd02:25
anteayaI have been encouraging those to wait for the queue to populate and then recheck if they don't see theirs02:25
jeblairanteaya: but anything added since about 4 hours or so ago i won't have02:26
anteayaI am encouraging the thundering herd to let us build up slowly02:26
anteayaah okay02:26
anteayaI'll use that as a marker02:26
jeblair(if i've been leaving recheck comments though, my script will get to those again)02:26
roaetjeblair: I'm assuming if you hit my change then it was there (i see your recheck there)02:26
jeblairroaet: probably, which number?02:26
jeblairroaet: ye,h it's about 30-something down the list, so probably any time now02:27
roaetThanks a lot. I'll try to help however I can in the future. Don't want to mythical man month you at the moment. But I'll try my best.02:28
lifelessttx: jeblair: so - nova baremetal is broken at the moment02:28
lifelessdoes that impact anything release mgmt wise right now ?02:28
*** dina_belova has joined #openstack-infra02:29
jeblairlifeless: i don't think so; we're just around feature freeze (we're actually only at feature proposal freeze)02:29
jeblairlifeless: h3 milestone release is sep 602:30
jeblair\o/ 3 changes just merged02:33
*** dina_belova has quit IRC02:33
morganfainbergjeblair: just looking at i see zuul posted on it about 15 minutes ago, but don't see it in the queue02:34
morganfainbergoh wait nvm02:34
morganfainbergMisreading time02:34
morganfainbergcrap 2hrs ago02:34
anteayaah okay02:34
morganfainbergi obviously can't think.02:34
anteayano worries02:34
jeblairmorganfainberg: yeah, you'll want to reverify that then, sorry02:35
anteayawe are all tired02:35
morganfainbergjeblair: not a worry man, just tyring to make sure i get these important ones in the queue02:35
*** gordc has joined #openstack-infra02:35
anteaya4 successful patches in the gate02:35
morganfainbergjeblair: is it going to take a while to pickup reverifies since it's still slowly reconsituting the queues?02:36
*** rcleere has joined #openstack-infra02:36
jeblairmorganfainberg: it's about 70 gerrit events behind right now, so if you add your reverify, it'll go onto the end of that queue first before it shows up in the gate queue02:37
morganfainberggreat thats that i wanted to know.02:38
*** ftcjeff has joined #openstack-infra02:38
jeblair6 more changes merged02:38
openstackgerritSteve Baker proposed a change to openstack-infra/config: Generate heat docs on check and gate
morganfainbergmerged is good!02:39
anteayayay merged02:39
jeblairthe git server is still going to be a big problem; a lot of tests are going to fail because of that.02:40
anteayathe graph seems to be in UTC02:41
anteayahopefully zuul can stay up for the rest of the night (insert appropriate time of day for yourself, dear reader)02:42
anteayajeblair: so the plan is to address git changes tomorrow?02:51
anteayayay 3 jobs in post02:51
jeblairanteaya: yes.  just like today.02:52
anteayasorry I thought today was benchmarking and tomorrow is making the changes02:52
jeblairanteaya: nope, benchmarking wasn't on the agenda until after we rolled it out.  it just took a while for haproxy to get set up.02:53
morganfainberganteaya: ok, my 2 changesets that are needed arrived in the gate queue, thanks again for keeping me posted on what was up over here earlier on.02:54
anteayasorry I missed that point02:54
anteayayou are welcome, morganfainberg, thanks for your patience02:54
gordcsweet, finally got a jenkins result back. big thanks jeblair and anyone else working on the issues.02:55
anteayayay gordc02:58
anteayacongratulations on your jenkins result02:58
anteayajeblair has been working hard on it02:58
gordcsmall victories :) yep, i've seen his name all over the rechecks.02:59
*** markmcclain has joined #openstack-infra03:00
anteayagotta celebrate them when they occur03:00
*** mriedem has quit IRC03:01
*** tjones has joined #openstack-infra03:02
*** tjones has left #openstack-infra03:02
*** blamar has quit IRC03:17
anteayazuul has been up for an hour and 40 minutes, how is it looking jeblair?03:19
anteayacan I cheer again?03:19
anteayaeverything I can see looks good03:20
anteayajobs are finishing, others are starting03:20
anteayaroaet your patch is being tested as we speak03:22
*** dina_belova has joined #openstack-infra03:29
*** jfriedly has quit IRC03:32
anteayathere are two patches, a cinder and a neutron patch that have been in the post queue for a while03:34
*** dina_belova has quit IRC03:34
anteayathe translation-update job passed for both but the other three jobs: tarball, coverage and docs are queued and have been for a while03:34
anteayaI will watch them and see if they move along03:35
anteayagate 72, check 153, post 203:35
Alex_Gaynorneed more workers :)03:35
anteayathat is what jeblair and clarkb talked about creating03:36
anteayaI think it is on tomorrow's agenda03:36
Alex_Gaynor"need more cloud" :)03:36
anteayamoar cloud03:37
anteayayeah, I hear that03:37
Alex_Gaynordid we land either the git mirroring or the zuul fix, or are we just flying on luck?03:37
anteayaokay those post patches have jobs running03:37
anteayazuul fix landed03:37
anteayabeen up for two hours with the new zuul fix03:38
Alex_Gaynoroh, awesome, so now it should be at least smooth sailing (but slow)03:38
anteayaso far, so good from what I can see03:38
anteayasmooth but slow would be great03:38
anteayahanging out to check on the smooth part03:38
Alex_Gaynorhmm, was the commit in the zuul repo? I don't see any new commits03:38
anteayanot yet03:38
anteayalet me dig it up03:39
Alex_GaynorI thought I read everything in the backlog, I must have missed it03:39
anteayaeverything I understand has me believing that jeblair made these changes before the last zuul restart03:40
anteayaI rely on jeblair to correct me if I am wrong03:40
anteayain this regard03:40
anteayathanks for asking03:41
anteayayou actually know more about what is going on than I do03:41
Alex_GaynorI seriously doubt it :)03:42
anteayaha ha ha03:42
*** yaguang has quit IRC03:42
anteayawell you know a lot03:42
*** nati_ueno has joined #openstack-infra03:42
anteayagrateful for you input03:42
*** dims has quit IRC03:43
*** nati_ueno has quit IRC03:44
*** yaguang has joined #openstack-infra03:45
anteayayay Queue lengths: 0 events, 0 results.03:51
anteayagate 71, post 1, check 15203:51
anteayait is deleting a bunch of servers and starting a bunch more jobs03:52
*** HenryG_ has joined #openstack-infra03:54
*** dstufft_ has joined #openstack-infra03:54
*** DennyZhang has quit IRC03:54
*** cyeoh has quit IRC03:54
*** soren has quit IRC03:54
*** dstufft has quit IRC03:55
*** soren has joined #openstack-infra03:55
*** DennyZhang has joined #openstack-infra03:55
*** cyeoh has joined #openstack-infra03:55
*** HenryG has quit IRC03:57
*** vogxn has joined #openstack-infra03:57
anteayaslow and smooth seems to characterize what I am seeing right now03:58
Alex_Gaynoruh oh, we've got a failure coming up in the gate pipeline :(03:58
anteayayeah that always makes me sad too03:59
anteayathe third patch03:59
anteayaso two have a chance of getting in, then reset03:59
* Alex_Gaynor shaves 45 minutes off his life03:59
anteayawhen I see 6 or 8 passing in the gate, I do a little happy dance in my chair04:00
anteayahere is a url for the test node graph04:00
anteayarefreshing the page updates the graph04:00
anteayaI have to turn in04:00
anteayawhat patch are you waiting on Alex_Gaynor?04:01
Alex_GaynorNothing in particular, I just like watching the patches flow through the system04:01
anteayaI hear that04:02
anteaya3 in post, yay!04:02
anteayaokay I'm done04:02
anteayahave a good night Alex_Gaynor04:03
Alex_Gaynoryou too!04:03
*** anteaya has quit IRC04:03
*** eharney has joined #openstack-infra04:03
*** dkliban has joined #openstack-infra04:04
Alex_Gaynoruh oh, the results seem to have started to build up again04:07
Alex_Gaynorthere's severeal changesets in check that should have been processed already04:08
Alex_Gaynormaybe not, coming back down04:12
*** huangtianhua has quit IRC04:15
*** dklyle has joined #openstack-infra04:23
*** dtroyer has quit IRC04:23
*** dtroyer has joined #openstack-infra04:23
*** retr0h has quit IRC04:23
*** david-lyle has quit IRC04:23
*** samalba has quit IRC04:23
*** comstud has quit IRC04:23
*** mberwanger has joined #openstack-infra04:24
*** retr0h has joined #openstack-infra04:25
*** retr0h has joined #openstack-infra04:25
*** comstud has joined #openstack-infra04:25
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Add support for parameter filters in copyartifact
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Fixed timeout wrapper
*** samalba has joined #openstack-infra04:27
*** gordc has quit IRC04:29
*** dina_belova has joined #openstack-infra04:30
*** rcleere has quit IRC04:32
*** markmcclain has quit IRC04:33
*** markmcclain has joined #openstack-infra04:33
*** dina_belova has quit IRC04:35
clarkbAlex_Gaynor: seems ok right now04:35
clarkbAlex_Gaynor: ahve you seen any more oddness?04:35
Alex_Gaynorclarkb: yeah, must have just been a temporary blip04:35
Alex_GaynorAlso, damn you centos for being the only thing with py2604:36
clarkbAlex_Gaynor: I agree04:36
clarkbcentos git makes me so sad04:36
*** boris-42 has joined #openstack-infra04:36
Alex_Gaynorclarkb: I don't think so, the only other thing I've noticed is that sometimes the SCP step takes an abnormally long time, way longer than I remember it taking it previous weeks04:37
*** DennyZhang has quit IRC04:37
clarkbAlex_Gaynor: they may be contention on the log server04:37
*** eharney has quit IRC04:37
Alex_Gaynormakes sense04:37
Alex_Gaynormost insane CI infrastructure I've ever been a part of04:37
clarkbits possible the finds that cleanup things slow stuff down when they run04:38
clarkbAlex_Gaynor: the big CPU blips you see on that page are a result of find running and deleting old logs, compressing new things and so on04:39
*** Anju has left #openstack-infra04:39
clarkbAlex_Gaynor: I will make a note to look at that once git is sorted04:39
Alex_Gaynorclarkb: yeah, definitely a low priority item :)04:39
clarkbjeblair: my git plan. 1. spin up new servers 2. replicate from gerrit to new servers. 3. merge change to use git:// in g-g-p 4. merge haproxy change 5. merge change to add haproxy nodes04:40
clarkbjeblair: 4 and 5 may end up being squashed together or at least merged together04:40
clarkbthen at some point we can use git:// in d-g but d-g should continue to happily use https so we can make sure nothing is falling over before doing that04:41
*** dstufft_ is now known as dstufft04:46
*** ftcjeff has quit IRC04:49
* Alex_Gaynor wonders if there's merit in installing a ppa for 2.6 on some of the other nodes04:50
clarkbmaybe? there is value in testing on centos04:51
clarkbnow we know that you don't want to deploy openstack with git on centos :)04:52
clarkbat least not with https and git-http-backend04:52
* Alex_Gaynor doesn't want to deploy much of anything on centos04:52
* dstufft concurs with Alex_Gaynor 04:52
Alex_Gaynorbasically the entire check queue is bottlenecked on 2.6 :{04:53
*** SergeyLukjanov has joined #openstack-infra04:53
clarkbAlex_Gaynor: yeah made worse by the gate monopolizing those resources04:53
Alex_Gaynorclarkb: probably better this way, as long as we don't get a gate reset04:54
clarkbya, making the gate a higher priority was done with reason04:55
clarkbin part to remove barriers to merging security related fixes04:55
Alex_Gaynorthe right way to address starvation in check is to just add more workers, not mess with the algorithms, IMO04:55
*** nati_ueno has joined #openstack-infra04:56
clarkbAlex_Gaynor: we definitely want to use nodepool to dynamically add slaves that hang around longer04:56
clarkbAlex_Gaynor: mordred has even hacked up kexec machinery that might be useful in having single use slaves that aren't as expensive to use as today's singel use slaves04:56
clarkbAlex_Gaynor: the tricky bit there is we have single use slaves like we do today because tests get root and can really hose stuff04:57
clarkbAlex_Gaynor: making sure that kexec can reboot into a good state without having been hosed by a test is a bit of work04:57
Alex_GaynorI wonder if there's any prior art04:58
clarkbAlex_Gaynor: jeblair had stuff to do it when the tests ran on hardware04:58
clarkbbut I am not sure how worried they were of root abuse (intentional or not) at the time04:58
*** dkliban has quit IRC05:00
fungiclarkb: still reading scrollback but for future reference i think you can pass git ipv6 address literals using standard square-bracket notation (git clone http://[2001:4800:7812:514:3bc05:03
pleia2ah! good to know05:04
*** Dr01d has joined #openstack-infra05:04
clarkbfungi: thanks. I wonder why it doesnt' split on the right side05:05
clarkbseems like that should work just fine. I gues sif you leave the port off you won't knwo if it is a port or part of the address05:05
clarkbfungi: if you want to poke at the centos git cloning the 15g server jeblair listed in scroll back iirc (I remembered that somehow) is the haproxy + apache + git serve and has openstack/nova on it05:07
clarkbfungi: the 30g server 198.something was where I was running the client05:07
clarkbfungi: haproxy is listening on 80, 443, and 9418 and apache is on 8080, 4443. git-daemon is on 2941805:09
*** Ryan_Lane has quit IRC05:10
*** nicedice_ has quit IRC05:13
fungiclarkb: cool. i'll see if i can spot any major differences in window scaling defaults in the kernel tcp/ip settings vs on ubuntu precise05:15
*** mberwanger has quit IRC05:16
*** primeministerp has quit IRC05:22
*** primeministerp has joined #openstack-infra05:29
*** dina_belova has joined #openstack-infra05:30
*** sridevi has joined #openstack-infra05:32
*** dina_belova has quit IRC05:35
openstackgerritA change was merged to openstack/requirements: Bump python-swiftclient requirement to >=1.5
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Fixing override-votes for gerrit trigger
*** morganfainberg is now known as morganfainberg|a05:56
*** dmakogon_ has joined #openstack-infra05:58
*** UtahDave has quit IRC05:58
*** xchu has quit IRC06:07
*** SlickNik has quit IRC06:09
*** SlickNik has joined #openstack-infra06:09
*** morganfainberg|a is now known as morganfainberg06:14
*** SergeyLukjanov has quit IRC06:15
*** markmc has joined #openstack-infra06:20
*** dguitarbite has quit IRC06:21
markmcclarkb, fwiw,
uvirtbotLaunchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]06:21
markmcclarkb, just trying to get the info in one place06:21
markmcclarkb, I'm gonna see what the story is with git being rebased in RHEL606:22
markmcclarkb, happy to help build a newer git RPM, though, if you'd use that06:22
*** sridevi has quit IRC06:23
clarkbmarkmc: thanks. I would be open to building newer git rpms but jeblair was understandably hesitant06:23
markmcclarkb, ok06:24
*** xchu has joined #openstack-infra06:24
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
*** HenryG_ has quit IRC06:30
*** mikal has quit IRC06:30
*** dina_belova has joined #openstack-infra06:31
*** p5ntangle has joined #openstack-infra06:31
*** mikal has joined #openstack-infra06:32
*** Dr01d has quit IRC06:34
*** dina_belova has quit IRC06:36
*** markmc has quit IRC06:38
*** nayward has joined #openstack-infra06:42
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
*** p5ntangle has quit IRC06:42
*** AJaeger has joined #openstack-infra06:57
*** nati_ueno has quit IRC06:58
*** Dr01d has joined #openstack-infra07:01
*** AJaeger has quit IRC07:05
*** SergeyLukjanov has joined #openstack-infra07:05
*** pblaho has joined #openstack-infra07:05
*** odyssey4me4 has joined #openstack-infra07:14
*** fbo_away is now known as fbo07:16
*** markmcclain has quit IRC07:18
*** yolanda has joined #openstack-infra07:21
*** sridevi has joined #openstack-infra07:24
*** Anju has joined #openstack-infra07:25
*** sridevi has quit IRC07:28
*** vogxn has quit IRC07:28
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
*** dina_belova has joined #openstack-infra07:32
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
*** dina_belova has quit IRC07:36
*** jpich has joined #openstack-infra07:37
*** mkerrin has joined #openstack-infra07:39
*** SergeyLukjanov has quit IRC07:40
*** p5ntangle has joined #openstack-infra07:43
*** afazekas has joined #openstack-infra07:52
*** boris-42 has quit IRC07:58
*** p5ntangle has quit IRC08:01
*** AJaeger has joined #openstack-infra08:01
*** AJaeger has joined #openstack-infra08:01
*** p5ntangle has joined #openstack-infra08:02
*** xchu has quit IRC08:04
*** AJaeger has quit IRC08:06
*** AJaeger has joined #openstack-infra08:12
*** AJaeger has joined #openstack-infra08:12
*** fifieldt_ has quit IRC08:15
*** xchu has joined #openstack-infra08:16
*** p5ntangl_ has joined #openstack-infra08:19
*** vogxn has joined #openstack-infra08:20
*** AJaeger has quit IRC08:20
*** cthulhup has joined #openstack-infra08:21
*** markmc has joined #openstack-infra08:22
*** p5ntangle has quit IRC08:22
*** michchap_ has joined #openstack-infra08:25
*** cthulhup has quit IRC08:25
*** dmakogon_ has quit IRC08:26
*** michchap has quit IRC08:27
*** ruhe has joined #openstack-infra08:30
*** dina_belova has joined #openstack-infra08:32
*** dina_belova has quit IRC08:37
*** koobs` has quit IRC08:45
*** koobs` has joined #openstack-infra08:45
*** koobs` is now known as koobs08:45
*** AJaeger has joined #openstack-infra08:50
*** AJaeger has quit IRC08:50
*** AJaeger has joined #openstack-infra08:50
*** p5ntangl_ has quit IRC08:54
*** AJaeger has quit IRC08:55
*** p5ntangle has joined #openstack-infra08:55
*** sridevi has joined #openstack-infra08:57
*** xBsd has joined #openstack-infra09:02
*** sridevi has quit IRC09:03
Anjucyeoh :   in neutron cli there is an optional argument of json and xml09:05
markmcclarkb, jeblair, there are git packages available for centos, signed with the centos testing key:
uvirtbotLaunchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]09:12
*** cthulhup has joined #openstack-infra09:15
*** cthulhup has quit IRC09:20
openstackgerritJulien Danjou proposed a change to openstack-infra/statusbot: Handle topic via a configuration file
*** michchap_ has quit IRC09:30
*** dina_belova has joined #openstack-infra09:33
openstackgerritSerg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects
openstackgerritSerg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects
openstackgerritSerg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects
*** dina_belova has quit IRC09:37
*** AJaeger has joined #openstack-infra09:37
*** AJaeger has joined #openstack-infra09:37
*** boris-42 has joined #openstack-infra09:37
*** enikanorov_ has joined #openstack-infra09:47
*** AJaeger has quit IRC09:49
*** BobBallAway is now known as BobBall09:57
*** xchu has quit IRC09:58
*** afazekas has quit IRC10:07
*** thomasbiege has joined #openstack-infra10:10
*** ruhe has quit IRC10:15
*** morganfainberg is now known as morganfainberg|a10:21
*** ruhe has joined #openstack-infra10:26
*** p5ntangl_ has joined #openstack-infra10:27
*** thomasbiege has quit IRC10:28
*** weshay has joined #openstack-infra10:29
*** p5ntangle has quit IRC10:30
*** ruhe has quit IRC10:30
*** dina_belova has joined #openstack-infra10:33
*** vogxn has quit IRC10:35
*** vogxn has joined #openstack-infra10:37
*** thomasbiege has joined #openstack-infra10:38
*** vogxn has quit IRC10:38
*** dina_belova has quit IRC10:38
*** vogxn has joined #openstack-infra10:38
*** vogxn has left #openstack-infra10:42
*** vogxn has joined #openstack-infra10:42
*** openstack has joined #openstack-infra15:12
markmcoh look, everything got requeued15:12
markmcdark magic at work15:12
anteayaexcept for 41070 at the bottom15:13
anteayait is still running15:13
*** reed has joined #openstack-infra15:13
markmc41070 is the first patch15:14
markmcnone of the rest can merge without them15:14
*** p5ntangle has quit IRC15:14
jeblairmarkmc: it just figured that out15:15
jeblair2013-08-22 15:14:42,294 INFO zuul.DependentPipelineManager: Dequeuing change <Change 0x7faf68327050 42433,7> because it can no longer merge15:15
markmcand 4 have disappeared15:15
*** ruhe has quit IRC15:17
jeblairthe reason it's slow is because it's building up proposed states of the git repo _before_ it checks that.  in retrospect, that does seem like a sub-optimal ordering.15:17
jd__just for my personal culture, the slowness is a problem with zuul or lack of resource to run the jobs?15:17
jd__'cause I saw a lot of checks waiting for python26 only15:18
anteayathat is a git issue15:18
markmcspeaking of python2615:18
anteayaproxying problems15:18
anteayawe are trying to address git today15:18
markmcjeblair, saw my message about newer centos6 git rpms?15:18
anteayajd__: so more than one issue15:18
jd__anteaya: is there a trace of that I can read about?15:18
anteayahopefully the problem with zuul has been addresssed15:18
anteayajust the log for the last 3-4 days15:19
anteayait has slowly built15:19
uvirtbotLaunchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]15:19
anteayathe tl;dr version is that zuul had a bug which was hard to trace but jeblair found it yesterday15:19
jeblairmarkmc: yes, thanks; i'm not sure if we should try to do that now or stick with the current tentative plan and switch to the git protocol, which is faster even with the version in centos615:19
jeblair(after we scale out the git server)15:20
markmcjeblair, cool15:20
anteayathe git issue I have a weak grasp of, but it is about not having enough git repos available to clone/download and we are having timeouts15:20
jeblairanteaya: the git server is overloaded if we run all of the jobs we have at once15:20
anteayathere we go, thanks jeblair15:20
anteayawe are working to better load balance the git server15:21
jeblairwhich is why we haven't added more centos slaves, because at this point, adding more slaves will only make that worse15:21
anteayahoping to make progress on that today15:21
*** rfolco has joined #openstack-infra15:21
anteayaright, it just increases the overload on the git server15:21
* anteaya feels understanding is starting to fall into place15:21
jeblairzaro: when you're up and have a minute; i have no idea why this happened:
markmcanteaya, if you're up for it, it might be cool to file bugs about ongoing stuff like this and update the bug as progress is made15:22
*** thomasbiege has quit IRC15:22
anteayaI can make an attempt on it15:22
jeblairzaro: oops, better paste here:
markmcthat'd be awesome15:22
anteayaI welcome your direction on it as I go along, markmc, thanks15:22
jeblairanteaya: ++15:22
BobBallrecheck no bug doesn't seem to be working?15:23
anteayaI will be afk for about 10 minutes and then will get started on bug reports15:23
BobBallI added a few on my changes and the check queue is still empty?15:23
jeblairBobBall: zuul has a backlog of gerrit events right now, it should get to it15:23
BobBallI'll be patient then :)15:23
*** dims has joined #openstack-infra15:23
jeblairBobBall: "Queue lengths: 126 events" is the operative thing15:23
*** thomasbiege has joined #openstack-infra15:24
jeblairzaro: anyway, it looks like jenkins said it was taking the node offline, but it apparently wasn't offline when the functions were registered, so it ran a job anyway15:24
BobBallahhhh I see15:24
*** vogxn has quit IRC15:25
anteayaBobBall: we just restarted zuul, and we have the queue in a staggered start15:26
anteayaonce the queue length is 0 events - will probably take about 90 minutes15:27
anteayaif you don't see your patch, then recheck15:27
BobBallMakes sense15:27
BobBallstop it overloading15:28
BobBallMight be worth adding the "wait 90 minutes" in the topic?  I'm sure I won't be the only person asking this15:28
*** jjmb has quit IRC15:28
anteayaI am going to work on some bugs as communication tools15:29
jeblairzaro: i think it's because when that happens, jenkins disconnects the node asynchronously; so it may not actually be offline for a while15:29
anteayathe wait will change as time passes, so the message will get stale quickly15:29
anteayaI can answer questions and folks are good about reading logs15:29
*** ruhe has joined #openstack-infra15:30
*** thomasbiege has quit IRC15:32
*** pblaho has quit IRC15:33
*** dina_belova has joined #openstack-infra15:35
reedhello folks15:35
jeblairreed: hello15:35
*** CaptTofu has quit IRC15:36
fungijeblair: clarkb: so i tried adjusting some tcp settings on git-test-15 but cloning nova from it via https was still taking ~8 minutes with nothing else going on15:40
fungiplus a lot of errors like...15:40
fungierror: Unable to get pack index
*** Ryan_Lane has joined #openstack-infra15:40
fungierror: Unable to find 8f47cb63996d34ce3d8fcaf9f449b400ce033c70 under
fungiCannot obtain needed object 8f47cb63996d34ce3d8fcaf9f449b400ce033c7015:40
fungiet cetera15:40
jeblairfungi: well that's what happens when it falls back on the dumb http protocol15:41
*** vogxn has joined #openstack-infra15:41
fungias opposed to git and http protocol which averaged a snappy 40 seconds15:41
fungiso yeah, i suspect there is something terribly wrong on centos as pertains to the git cgi backend and https but still no clue what15:41
* fungi has to dash out to meet some people but will check back in later15:42
clarkbfungi I actually think it is a client side issue15:42
*** gyee has joined #openstack-infra15:42
fungiclarkb: marvellous15:42
clarkbother git clients clone from that host just fine. centos 1.7.1 git does not15:42
clarkbjeblair I am going to run to the office in a minute then will begin the load balance git process15:43
jeblairclarkb: cool, i should be ready to pitch in then15:43
pleia2clarkb: want me to do some time tests with 1.7.1 and the rpmforge 1.7.11 so at least we have a data point?15:43
clarkbpleia2 yes testing newer git clients on centos would at least help confirm it is client side15:44
pleia2k, will do15:44
markmcpleia2, I provided a link to a repo containing for centos, maintained by a centos maintainer15:46
* markmc digs it up again15:46
pleia2markmc: saw the lp link, I can use that15:47
uvirtbotLaunchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]15:47
markmcpleia2, ok, great15:47
markmcpleia2, just wasn't sure from "rpmforge"15:47
pleia2markmc: the packages on rpmforge seem to be the most common way folks install newer versions on centos15:48
markmcpleia2, who maintains those?15:48
pleia2markmc: I don't know15:49
markmcpleia2, right :)15:49
*** wu_wenxiang has joined #openstack-infra15:52
wu_wenxiang, I tried "recheck no bug" twice, however didn't start check process15:53
*** pabelanger_ has joined #openstack-infra15:55
*** pabelanger_ has quit IRC15:56
*** pabelanger_ has joined #openstack-infra15:56
anteayahere is the LOST test logs bug:
uvirtbotLaunchpad bug 1215511 in openstack-ci "LOST test logs" [Undecided,New]15:57
*** CaptTofu has joined #openstack-infra15:57
*** dina_belova has quit IRC15:58
*** pabelanger has quit IRC16:00
*** pabelanger_ is now known as pabelanger16:00
markmcanteaya, nice16:00
*** pabelanger_ has joined #openstack-infra16:00
*** CaptTofu_ has joined #openstack-infra16:01
*** CaptTofu has quit IRC16:02
wu_wenxiang, I tried "recheck no bug" twice, however didn't start check process, Could anyone help? Thanks16:03
pleia2wu_wenxiang: someone should be able to take a look soon, it's been a bit of a crazy week16:05
jeblairwu_wenxiang: it's probably in the backlog of gerrit events16:05
*** markmc has quit IRC16:05
jeblairwu_wenxiang: "Queue lengths: 106 events" on the status page; it should get to it soon16:05
wu_wenxiangpleia2: jeblair: Thanks16:06
wu_wenxiangpleia2: crazy week means too much commit?16:07
*** dkranz has quit IRC16:07
*** cthulhup has joined #openstack-infra16:09
*** datsun180b has quit IRC16:09
*** datsun180b has joined #openstack-infra16:09
*** alexpilotti_ has joined #openstack-infra16:11
*** dklyle is now known as david-lyle16:11
*** ruhe has quit IRC16:11
*** pabelanger has quit IRC16:11
*** zul has quit IRC16:12
*** cppcabrera has left #openstack-infra16:12
anteayahere is the my rechecked/reverified patch isn't in the queue:
uvirtbotLaunchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in" [Undecided,New]16:13
*** alexpilotti has quit IRC16:13
*** alexpilotti_ is now known as alexpilotti16:13
anteayagoing to grab a bite to eat16:14
*** wu_wenxiang has quit IRC16:14
*** SergeyLukjanov has quit IRC16:14
jd__anteaya: ah, your bug is exactly the question I was going to ask!16:14
anteayamy first customer16:15
anteayajd__: add comments if I left anything out16:15
*** jfriedly has joined #openstack-infra16:17
*** ^d has joined #openstack-infra16:19
*** ^d has joined #openstack-infra16:19
*** ruhe has joined #openstack-infra16:19
*** ruhe has quit IRC16:19
clarkbjeblair: I am in front of the big monitor now16:20
clarkbjeblair: I am going to spin up git01 through git04 on the ci account as 8GB centos nodes16:21
clarkbjeblair: and will point them at the puppet development env so that they get all of the cgit stuff16:21
*** arezadr has quit IRC16:21
clarkbThen I will propose a change to replicate gerrit to them and update the existing change to balance across them. Once gerrit replication has caught up merge the haproxy and g-g-p changes16:22
zarojeblair: do we need a double check to make sure slave is offline in StartJobWorker?16:22
jeblairzaro: that's the thing, there's a check in registerfunctions; and since it registered 46 functions, it must have been online16:23
jeblairzaro: oh, i see16:23
jeblairzaro: a check right before we accept a job16:23
*** arezadr has joined #openstack-infra16:24
jeblairzaro: yeah i think if we wanted to do that, maybe put it in the gearmanworkerimpl right before we do a grab_job?16:24
zarojeblair: that would probably work.  i was thinking another one after setting slave offline?16:25
jeblairzaro: so we don't get it from gearman (once we get the job from gearman, it doesn't matter, we have to run it)16:25
*** dkranz has joined #openstack-infra16:25
jeblairzaro: what do you mean about setting the slave offline?16:25
jeblairzaro: and here's an idea we should have thought about earlier -- why don't we have the gearman plugin always return work_complete if the jenkins job finishes (regardless of the outcome); but have it return work_fail if it grabs a job and finds that it can't run it...16:27
jeblairzaro: it already returns work_exception if there is a problem running it; i should have zuul catch that case and re-run the job16:27
jeblair(that would help with some of the strange exceptions we've been seeing)16:28
jeblairzaro: and then later, if we do the thing with work_fail, we could have zuul do the same thing (re-run the job)16:28
*** SergeyLukjanov has joined #openstack-infra16:28
jeblairclarkb: great, i'll be with you in just a min.16:28
*** dina_belova has joined #openstack-infra16:29
*** datsun180b has quit IRC16:29
zarojeblair: i don't see a problem with that off the top of my head.16:30
*** mrodden has quit IRC16:35
HenryGHow do I search for gerrit reviews containing a specific string in the commit message?16:36
*** saper has quit IRC16:37
*** saper has joined #openstack-infra16:37
clarkbHenryG: there may be a way to do it with grep and the ssh query interface, but the gerrit ui does not offer that functionality16:38
clarkbHenryG: upstream gerrit has played with using lucene to index that stuff but it gets expensive quickyl16:38
clarkbjeblair: rax gave me a build error on the first host (I was going to build one before the others to smooth out any additional stuff). Have you seen BUILD 0 then ERROR before?16:39
HenryGclarkb: thanks. :(16:40
*** datsun180b has joined #openstack-infra16:40
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Add option to test jenkins node before use
clarkbjeblair: I am going to try a second host and see if this is transient16:41
*** zul has joined #openstack-infra16:41
jeblairclarkb: all the time, yeah, just try again16:41
*** cthulhup has quit IRC16:41
jeblairclarkb: ^ that patch is untested; no rush -- but something to think about in the back of your head for after the git server.16:41
*** pcrews has quit IRC16:41
jeblairclarkb: the idea is we can have nodepool run a very simple test job before actually putting each node into service16:42
jeblairclarkb: it might be useful for some of the weird errors we've b16:42
jeblaireen seeing from jenkins16:42
jeblairclarkb: (though it would mean quite a bit more work for jenkins)16:43
clarkbI like the idea. Possibly try to find better performing nodes if we can test that quickly and have a decent understanding of what to look at16:43
jeblairclarkb: yeah, could put anything in the test.  though i was thinking "echo ok" for now.16:43
*** nicedice_ has joined #openstack-infra16:44
*** morganfainberg|a is now known as morganfainberg16:44
clarkbya performance testing probably won't happen any time soon16:44
jeblairclarkb: anyway, how may i help?16:44
clarkbjeblair: want to get a change ready to switch g-g-p to using git:// again?16:45
pleia2clarkb: time output is in the etherpad (1.7.12 is faster)16:46
*** Dr01d has quit IRC16:46
HenryGclarkb: Googling "<text>" turned up some useable results for carefully chosen <text>. YMMV.16:46
jeblairclarkb: ack16:46
clarkbpleia2: cool. want to try cloning from using both client versions of git?16:47
clarkbpleia2: I expect 1.7.1 to fail16:47
pleia2on it16:48
jeblair04:41 < clarkb> jeblair: my git plan. 1. spin up new servers 2. replicate from gerrit to new servers. 3. merge change to use git:// in g-g-p 4. merge haproxy change 5. merge change to add haproxy nodes16:48
*** vogxn has left #openstack-infra16:48
*** mrodden has joined #openstack-infra16:49
clarkbjeblair: that is still the plan, though at this point I expect 4 and 5 to be one change16:49
jeblairclarkb: why not do 3 last?16:49
clarkbjeblair: I was thinking of propogation delay of the JJB update16:49
clarkbjeblair: it can be done last16:49
pleia2clarkb: ssl errors, how were you getting around this?16:49
clarkbs/propogation delay/time to run/16:49
*** jpich has quit IRC16:50
*** krtaylor has quit IRC16:50
clarkbpleia2: you have to tell git to ignore ssl errors /me looks in hsitory for the flag16:50
clarkbpleia2: GIT_SSL_NO_VERIFY=true16:50
jeblairclarkb: ok, original plan wfm16:50
pleia2clarkb: thanks16:51
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Switch ggp to use git://
jeblairclarkb: I'm updating the etherpad with the plan and changes16:52
*** bingbu has quit IRC16:53
*** svarnau has joined #openstack-infra16:54
jeblairclarkb: is there a replication change?16:54
clarkbjeblair: there isn't a replciation change yet. I suppose that doesn't need IP addresses. jeblair can you write that one too and put it on the bottom of the current stack of 2 changes?16:55
clarkb(there will be a conflict because I put a todo in my change to do it)16:55
*** fbo is now known as fbo_away16:56
jeblairclarkb: oh, right, you said 4+5 are one change... hang on16:56
*** dina_belova has quit IRC16:57
clarkbjeblair: ya the haproxy stuff needs IP addreses so will happen after the nodes are all spun up and replicated to. But gerrit replication doesn't need IP addresses so you can get that change ready and merge it as soon as those hosts have DNS records16:57
clarkbjeblair: what was the pyyaml workaround?16:57
pleia2clarkb: yeah, after ~6 minutes it fails on 1.7.1, but 1.7.12 works (added to pad)16:57
jeblairclarkb: oh, that's what you meant by bottom.  ok, i think i'm caught up now16:57
jeblairclarkb: 'pip uninstall pyyaml'; re run puppet16:58
clarkbpleia2: awesome. I think that confirms it is client side and version related16:58
clarkbjeblair: thanks16:58
jeblairclarkb: those are the 2 changes you're talking about, right?16:59
jeblair(haproxy and xinetd)16:59
anteayaetherpad link, for those viewers at home:
clarkbjeblair: yes16:59
*** SergeyLukjanov has quit IRC17:00
clarkbgit02 is happy now. I will add its DNS record then do the other three in one batch17:00
*** dkranz has quit IRC17:01
*** dkranz has joined #openstack-infra17:01
*** dims has quit IRC17:01
*** nati_ueno has joined #openstack-infra17:02
*** dims has joined #openstack-infra17:02
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Replicate to git01-git04
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Load balance git requests.
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Swap git daemon in xinetd for service
*** morganfainberg is now known as morganfainberg|a17:05
*** BobBall is now known as BobBallAway17:05
*** SergeyLukjanov has joined #openstack-infra17:05
*** nayward has quit IRC17:06
clarkbjeblair: that looks right17:07
clarkbjeblair: pleia2's test indicates upgrading git would help in the https case should we need to go down that route17:07
jeblairclarkb: excellent.  i love plan b's.  and c's. and d's.17:08
*** thomasbiege has joined #openstack-infra17:08
clarkbjeblair: eventually we will have the whole alphabet17:09
jeblairsometimes i put j at the end, that could be confusing.17:09
anteayaright now this is on the zuul status page: Queue lengths: 50 events, 84 results. What results are being referenced here?17:10
*** svarnau has quit IRC17:10
jeblairanteaya: results from jenkins17:10
anteayalike logs?17:10
jeblairanteaya: just information as to whether the job succeded17:10
anteayaah okay17:10
anteayasuccess, failure, lost17:11
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Adding support for the Warnings plugin
clarkboh shiny I have two git01's because of the error17:11
jeblairanteaya: when they pile up like that, it's usuaally because zuul either started or stopped a bunch of jobs.17:11
clarkbjeblair: do I need to explicitly delete the one in ERROR state?17:11
jeblairclarkb: yes17:11
*** lcestari has joined #openstack-infra17:11
anteayaah okay, I didn't know the Jenkins results were queued as well17:11
*** svarnau has joined #openstack-infra17:12
jeblairanteaya: zuul is almost to the point where we can get rid of that.17:12
jeblairanteaya: it looks like it was a gate reset, so those were probably abort results17:12
anteayaah okay17:12
*** svarnau has quit IRC17:12
*** svarnau has joined #openstack-infra17:13
*** wenlock has joined #openstack-infra17:13
anteayaI think I can see the gate reset in this graph:
anteayalooks like it happened 15 or 20 minutes ago17:14
* anteaya nods17:15
anteaya282 results that are queued right now, I am going with no action is required from us17:16
jeblairclarkb: $::ipaddress is a puppet fact?17:17
clarkbjeblair: yes17:18
clarkbgit01 has dns records and is puppet happy17:18
clarkbstill waiting for the error state node to go away17:18
clarkbgit04 errord as well and git03 will be ready as soon as the reboot completes17:18
*** alexpilotti has quit IRC17:18
jeblairclarkb: don't hold your breath17:18
jeblairanteaya: zuul is done launching all the jobs from the gate reset and is back processing the event and result queues again17:19
clarkblaunching a new git04. errored git04 went away faster than git0117:20
*** thomasbiege has quit IRC17:20
anteayajeblair: grand thank you17:20
clarkb1 though 3 should have DNS records and are puppet happy now. Just waiting on git0417:21
*** SergeyLukjanov has quit IRC17:22
jeblairbtw, the new image in az2 looks good (no java segfault), but i haven't deleted the old nodes in jenkins which are preventing its use17:23
clarkbjeblair: ok17:24
jeblair(as a mechanism to slow nodepool)17:24
clarkbjeblair: note that I am running all puppet on these new nodes out of the development env so that when we do merge the prposed changes the diff puppet has to deal with should be minimal or nil17:24
*** boris-42 has joined #openstack-infra17:25
jeblairclarkb: ack17:25
clarkbthe exciting puppet run will be on git.o.o though :)17:25
jeblair#status ok17:25
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure | docs | bugs |"17:25
*** pcm_ has quit IRC17:28
anteayayay back to status ok17:28
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
*** pcrews has joined #openstack-infra17:28
*** pabelanger has joined #openstack-infra17:29
*** morganfainberg|a is now known as morganfainberg17:29
*** svarnau has quit IRC17:29
*** svarnau has joined #openstack-infra17:30
jswarrenAny thoughts on why the python26 jobs appear to be significantly slower than the python27 jobs?17:31
clarkbjswarren: there are a couple related things but the biggest factor is we have fewer slaves capable of running python26 jobs17:32
clarkbjeblair: git04 is almost ready17:32
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
*** pcm_ has joined #openstack-infra17:32
jeblairAlex_Gaynor: mind if i quote you in my slides next time i give a presentation?  :)17:32
Alex_Gaynorjeblair: sure, what'd I say?17:33
*** markmcclain has quit IRC17:33
*** thomasbiege has joined #openstack-infra17:33
clarkbjswarren: it also doesn't help that the python26 jobs do tend to take a little longer as they run on hosts with older slow git and I think running many of our tests on python26 just takes longer17:33
*** xBsd has quit IRC17:34
jeblair04:38 < Alex_Gaynor> most insane CI infrastructure I've ever been a part of17:36
Alex_Gaynorjeblair: oh, absolutely :D17:36
*** morganfainberg has left #openstack-infra17:36
clarkbgit04 is happy with puppet now17:37
*** morganfainberg has joined #openstack-infra17:37
clarkbwaiting for DNS records to resolve then I think we can prepare to replicate17:37
clarkbjeblair: ^ does approving the replication change automatically restart gerrit? if not I think we should go ahead and merge17:37
jeblairclarkb: i don't _think_ anything restarts gerrit except an upgrade17:38
*** fbo_away is now known as fbo17:38
*** jbjohnso has quit IRC17:38
jeblairclarkb: yeah, looking at the puppet, i think we're fine.17:39
clarkbjeblair: the haproxy change failed puppet lint but I can fix that when I add the balancermembers17:39
*** svarnau has quit IRC17:39
* anteaya gets ready to applaud17:40
clarkbanteaya: we are still a little ways out17:40
*** svarnau has joined #openstack-infra17:40
anteayaI'll applaud all I can17:40
clarkbgoing to wait for replication to happen completely before moving to the next step17:40
clarkbjeblair: is it not possible to SIGHUP gerrit and have it pick up those changes?17:40
clarkbiirc gerrit can pick up some config and project changes on the fly but I never remember which ones17:41
clarkbone replicated we can do a quick set of tests to make sure 8080, 4443, and 29418 all answer to git operations17:42
jeblairclarkb: i think i read in a stackoverflow question yesterday it needed a restart17:42
jeblairclarkb: gerrit restarts are fairly fast, i don't think it's a big deal17:42
clarkbjeblair: ok17:42
*** svarnau has quit IRC17:42
openstackgerritA change was merged to openstack-infra/config: Replicate to git01-git04
*** dina_belova has joined #openstack-infra17:45
*** svarnau has joined #openstack-infra17:45
jeblairyay gate priority17:45
*** SergeyLukjanov has joined #openstack-infra17:46
clarkbjeblair: do you want to kick gerrit when you think it is safe? I am going to fix the haproxy change and add the balancermembers17:47
*** thomasbiege2 has joined #openstack-infra17:47
*** ruhe has joined #openstack-infra17:48
*** svarnau has quit IRC17:48
anteayado we need a channel status update for the gerrit reset?17:48
clarkbanteaya: maybe not. as jeblair mentioned it goes really fast though occasionally people do notice17:49
anteayaI will stand by to field inquiries17:49
anteayathough folks have been really patient and supportive17:49
anteayathanks everyone17:50
jeblairclarkb: i will handle the gerrit restart17:50
*** changbl has joined #openstack-infra17:51
*** thomasbiege has quit IRC17:51
openstackgerritClark Boylan proposed a change to openstack-infra/config: Load balance git requests.
clarkbthat should pass lint and it adds the balancer members17:52
*** ^demon has joined #openstack-infra17:54
jeblair#status notice restarting gerrit to pick up a configuration change17:55
openstackstatusNOTICE: restarting gerrit to pick up a configuration change17:55
^demonjeblair: I wasn't paying attention to what channel I was in and I freaked out for a moment.17:56
^demonI was like "who's making config changes and I don't know?" :)17:56
*** ^d has quit IRC17:56
jeblair^demon: haha!17:57
uvirtbotjeblair: Error: "demon:" is not a valid command.17:57
jeblairwow, uvirtbot makes it really fun to talk to you ^demon :)17:57
jeblairneed to get gerrit to accept the new hostkeys17:58
*** thomasbiege2 is now known as thomasbiege17:58
clarkbjeblair: pleia2: Is that puppetted or do we just do it by hand?17:59
jeblairclarkb: i don't think it's puppeted17:59
*** AJaeger has joined #openstack-infra18:00
*** AJaeger has joined #openstack-infra18:00
clarkbjeblair: ya I don't see it in the site.pp node for review.o.o18:00
pleia2there is an open bug for sorting out gerrit's keys18:00
pleia2(I opened it recently)18:00
anteayaso zuul and jenkins are still working on what they had, but since gerrit is down nothing new is being queued18:00
anteayanow I see18:00
jeblairi think i may need to restart gerrit again?18:01
jeblairanteaya: gerrit is not down18:01
anteayaoh sorry18:01
clarkbjeblair: maybe? java likes to cache a lot of stuff including perhaps the known hosts file18:01
jeblairi'm going to restart gerrit again and see if it picks up the known hosts changes18:01
pleia2 for when someone is bored ;)18:02
uvirtbotLaunchpad bug 1209464 in openstack-ci "Start managing ~gerrit2/.ssh/ contents in puppet" [Undecided,New]18:02
jeblairpleia2: ++18:02
jeblair[2013-08-22 18:03:29,807] ERROR : Cannot replicate to file:///var/lib/git/stackforge/python-ipmi.git; repository not found18:03
jeblairthat's slightly disturbing18:03
clarkbjeblair: that was one of the projects that got renamed18:04
jeblairboth python-ipmi and pyghmi exist in gerrit's git repo dir18:04
*** p5ntangle has joined #openstack-infra18:06
jeblairok, the db has no python-ipmi entries18:08
clarkbso monty must've done a cp instead of a mv18:09
jeblairthere doesn't seem to be anything new in python-ipmi....18:09
jeblairwait, i wonder if manage_projects put it back18:10
jeblairbecause it's actually quite old18:10
mtreinishjeblair: quick question: do I need to do another reverify on:
*** svarnau has joined #openstack-infra18:11
mtreinishbecause I don't see it in the gate pipeline18:11
clarkbmtreinish: yes I think so18:11
jeblairclarkb: nah, projects.yaml looks right; probably a cp then.  so i'll stop gerrit and mv it out of the way18:11
clarkbjeblair: ok18:11
mtreinishclarkb: ok thanks18:11
jeblair#status notice stopping gerrit to correct a stackforge project rename error18:12
openstackstatusNOTICE: stopping gerrit to correct a stackforge project rename error18:12
*** dmakogon_ has joined #openstack-infra18:13
jeblairthis may make zuul unhappy, it's in the middle of a gate reset18:13
clarkbjeblair: up to you if you want to wait18:13
clarkbI replication does appear to be happening for everything else18:13
jeblairit is done18:14
*** mrodden has quit IRC18:14
clarkbhttp and git:// seem to be working on git01 but not https. looking into that now18:16
clarkbpleia2: jeblair did you guys want to try cloning from the other hosts?18:16
jeblairclarkb: can do18:16
jeblairclarkb, pleia2: gerrit replication is still runinng18:17
ttxlifeless: as long as it gets fixed sometimes in the next two months (and stay fixed), we should be good18:17
jeblairwe might want to wait until that finishes18:17
clarkbjeblair: ok18:17
mtreinishclarkb: do again now, because I did it right before the gerrit restart?18:18
clarkbmtreinish: gerrit restart shouldn't affect you (zuul shouldget that event quick enough)18:18
mtreinishclarkb: ok18:18
anteaya33 events in the zuul queue18:18
ttxjeblair: the gate looks calmer today. Anything special you've done ? Just arrived18:19
anteayamtreinish: when zuul has 0 events, your patch should show up on the status page18:19
anteayattx kept zuul running overnight18:19
mtreinishanteaya: ok18:19
anteayazuul had a bug which jeblair fixed last night18:20
jeblairttx: i fixed a zuul bug last night (which was causing us to restart zuul a lot with nothing merging)18:20
* ttx is still trying to understand the patterns that govern gate load18:20
*** melwitt has joined #openstack-infra18:20
* anteaya too18:20
jeblairttx: we're working on load balancing git.o.o so that we can serve git repos to all the jobs we need to run18:20
clarkberror: gnutls_handshake() failed: A TLS warning alert has been received. while accessing is what I got speaking https to git0118:20
anteayattx can't hurt to read this:
ttxjeblair: is that the new bottleneck ?18:21
uvirtbotLaunchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in" [Undecided,New]18:21
ttxanteaya: thx for the pointer18:21
*** zul has quit IRC18:21
jeblairttx: yes; we're actually keeping our slave count artificially low to try to stress it less (but we still get occasional errors)18:21
jeblairttx: once that's scaled out, we should be able to run a lot more tests at once, which should help with backlogs18:22
jeblairttx: (there are a few jenkins errors we've encountered as well that we need to work around; that's next up)18:22
*** mrodden has joined #openstack-infra18:22
ttxjeblair: thx for the executive summary :)18:22
jeblairttx: np18:22
clarkbjeblair: I think it is related to the hostname and the cert. GIT_SSL_NO_VERIFY isn't letting it though though as it happens in the handshake. Speaking directly to the ip works18:24
clarkbI am going to test with a hacked up /etc/hosts18:24
*** erfanian has quit IRC18:24
clarkbhacked up /etc/hosts makes it better18:28
*** sarob has joined #openstack-infra18:29
*** sarob has quit IRC18:31
*** alexpilotti has joined #openstack-infra18:31
*** sarob has joined #openstack-infra18:31
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add git01-git04 to cacti
clarkbjeblair: and git.o.o too18:32
jeblairclarkb: fungi did that yesterday (or the day before)18:32
clarkbcool I missed that18:33
jeblairclarkb: why don't you push that one through real quick?18:33
jeblairclarkb: yeah, it's telling18:33
jeblairclarkb: it's how we knew we were cpu bound and not io or network18:33
clarkbgit01 is serving files over all three protocols. I just have to give it an IP address for https otherwise tls handshaking complains18:34
clarkbjeblair: gotcha18:34
clarkbI am going to test git02 now18:34
jeblairok, i'll test 3 and 4 with the ip18:34
*** mrmartin has joined #openstack-infra18:34
jeblairclarkb: you mean with etc hosts, right?18:35
jeblairoh, or use the ip but set the no verify var?18:35
jeblairyeah, that seems to work18:35
*** sarob has quit IRC18:35
clarkbjeblair: IP and no verify or you can verify and put the ip address in /etc/hosts for git.o.o18:37
*** krtaylor has joined #openstack-infra18:37
jeblairclarkb: a nova clone too real    2m33.517s18:38
jeblairover https18:38
anteaya7 jobs in post, yay!18:38
clarkbI got one Timeout waiting for output from CGI script /usr/libexec/git-core/git-http-backend on git0218:39
clarkbis that timeout something we can extend?18:39
jeblairclarkb: why did you get a timeout from a server under no load?18:39
clarkbjeblair: I am cloning http https and git:// concurrently so it has some load18:39
anteayalook at all the merges in the last hour:
openstackgerritA change was merged to openstack-infra/config: Add git01-git04 to cacti
*** pabelanger has quit IRC18:40
jeblairoh, i was cloning from 02, sorry; i guess that could explain the time18:40
*** pblaho has joined #openstack-infra18:40
jeblairbut no, 03 and 04 are taking forever too18:42
clarkbjeblair: there is some delay as git has to do pack files and things18:43
clarkbload doesn't look terrible on 0318:43
jeblair03 took 2m15.276s18:43
jeblairclarkb: i just started another clone18:43
jeblairclarkb: i believe we were shooting for <1 min, yeah?18:44
clarkbjeblair: yeah, but really only for git://18:44
jeblairclarkb: i did all of my tests with https; and this is on a precise node18:44
clarkboh I see18:45
jeblairi think the refs aren't packed at all18:45
clarkbjeblair: ya18:45
jeblairso somehowe the git.o.o repos ended up with packed refs, but not these.  i'm testing if that's the diff.18:45
* anteaya tries to pick the best time for her 1 hour afternoon walk18:46
clarkb02 is 1:42 git clone nova over git protocol18:47
clarkbnova repo on 02 has one pack file and a bunch of loose files18:47
clarkbI think you are onto something with thattheory18:48
jeblairclarkb: i'm looking at refs, not objects18:48
*** thomasbiege has quit IRC18:48
jeblairclarkb: after a 'git gc' (which did both objects and refs), it's real    0m52.021s18:49
clarkbjeblair: should we add a daily/weekly cronjob to git gc?18:51
jeblairclarkb, pleia2: does cgit do repo maintenance, or do we have a cron defined?18:51
*** nayward has joined #openstack-infra18:51
pleia2jeblair: it does not18:51
Alex_Gaynorso in is there a reason we don't use --depth 118:51
jeblairhow did we end up with a packed repo state?18:51
pleia2jeblair: it's really just a web interface that accesses the repo, doesn't do much else18:51
pleia2jeblair: maybe that's how it's replicated?18:51
clarkbAlex_Gaynor: there is a reason and I always forget what it is18:52
jeblairAlex_Gaynor: that's used to build an image, then the full repo is available at basically no cost (which is useful because tests can run on any branch)18:52
Alex_Gaynorjeblair: ah ok, so it's in an image, that was hte missing bit in my mind18:52
jeblairAlex_Gaynor: yep.  mordred was pointing out that in devstack-gate itself (in the wrap script) we could possibly be doing something smarter than 'git remote update'18:53
jeblairAlex_Gaynor: but we need to be careful that whatever we change there doesn't transfer load to the zuul server (where the actual test refs are served)18:53
jeblairpleia2: the repos that were just replicated look just like the gerrit repos18:54
pleia2ah, hrm18:54
clarkbmaybe the cgit package comes with a cron to do it?18:55
jeblairbtw, https clone from review.o.o is real    1m0.056s18:56
jeblair(using the local mirr,r not gerrit)18:56
clarkbso we are on par with that18:57
jeblairclarkb: _if_ we pack refs18:57
jeblairon the mirror18:57
jeblairpacked refs only (not a gc): real    0m46.005s18:58
*** hartsocks has joined #openstack-infra18:58
jeblairthat's actually faster than the gc18:58
bnemecI'm seeing a couple of changes that have no Jenkins score and aren't showing up on the status page.18:59
anteayathe git-fetch-pack command allows you to specify <refs>:
bnemecShould I go ahead and recheck them?18:59
anteayaah shoot - that is 1.8.318:59
jeblairbnemec: yep19:00
bnemecjeblair: Okay, thanks.  I didn't want to drive any extra load unnecessarily.19:00
anteayabnemec: more explaination here:
uvirtbotLaunchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in" [Undecided,New]19:00
clarkbjeblair: speaking of zuul. Does the zuul process that is currently running catch SIGUSR2 properly?19:00
jeblairclarkb: so i think we should 'git pack-refs --all' nightly on the mirrors19:00
dansmithjeblair: I haven't been rechecking things much since a lot of things seem to be failing all tests due to package fetch timeouts or something like that19:00
jeblairclarkb: yes, i restarted with both of those changes19:00
dansmithjeblair: is that just my imagination?19:01
bnemecjeblair: Oh, that's embarrassing.  I even saw that link earlier.19:01
clarkbjeblair: I agree19:01
jeblairdansmith: nope, we're working on that now19:01
anteayadansmith: no, that is the git issue we are working on19:01
anteayadansmith: not your imagination19:01
anteayabnemec: no worries19:01
jeblairclarkb: i'll write that change real quick?19:01
dansmithokay, I figured, but also figured more rechecks weren't likely to help :)19:02
clarkbjeblair: go for it19:02
*** sarob has joined #openstack-infra19:02
clarkbjeblair: base it atop my haproxy change19:02
*** lbragstad has left #openstack-infra19:02
anteayadansmith: not right now, but you are free to spin the wheel and take your chances like everyone else19:02
clarkbjeblair: so that we can continue using the development env until we actually turn haproxy on19:02
dansmithanteaya: hah, okay :P19:02
jeblairclarkb: i think later we may want to swing back around and look into using a newer git on these servers19:02
clarkbjeblair: ++19:02
jeblairclarkb: because perhaps the newer git can deal with unpacked refs better19:02
jeblairclarkb: but i think i'm fine with packed refs in a mirror19:03
clarkbjeblair: any idea how packed refs like that will affect fetches of a few refs?19:03
clarkbdoes git unpack them and give you just what you want?19:03
jeblairclarkb: it's just the list of refs19:03
clarkboh right you are packing refs. I keeping thinking objects19:04
jeblairclarkb: for use when git advertises what refs it has19:04
clarkbobjects != refs and I need to beat that into my brain19:04
clarkbI am going to find some food really quick. I smell it so I won't be gone long19:04
anteayahappy food clarkb19:04
pleia2yes, lunch19:05
reedget good food19:05
anteayahappy lunch pleia219:05
mrmartinjeblair: if you have some free minutes, please review it is blocking task in the groups portal. thnx!19:05
reedwhat's the current estimate for this patch to land somewhere?
uvirtbotLaunchpad bug 1179526 in horizon "source_lang in Horizon repo is overwritten by Transifex" [High,Confirmed]19:05
reedno, not that19:05
reedthis one
anteayais it in the queue, reed?19:06
reedanteaya, waiting for review19:06
anteayasorry, no it isn't - I'm focused on queue questions, sorry19:06
jeblairmrmartin: this week is very unusual -- we're having a lot of load problems because we have a feature freeze this week, and we only have 2 infra team members working19:07
jeblairmrmartin: as soon as we have things working reliably again, i will review that patch and reed's as well19:07
mrmartinjeblair: ok, maybe on monday?19:07
anteayaboth linked to the same patch19:07
jeblairmrmartin: certainly by monday19:08
jeblairmrmartin: did you see the instructions for running that locally on a test server?19:08
mrmartinjeblair, I tested it in a local vm19:08
jeblairmrmartin: even if we haven't merged that and launched the real server yet, i wanted to make sure you can work on it locally19:08
jeblairmrmartin: okay, great19:08
mrmartinwas working in the test env, but you know, it doesn't mean that everything will be perfect on prod :D19:09
anteayatrue, but it is a very good start19:09
jeblairyep :)19:09
*** CaptTofu_ has quit IRC19:09
*** CaptTofu has joined #openstack-infra19:10
anteayazuul reports 0 events, yay19:10
*** ^demon is now known as ^demon|lunch19:10
*** sarob has quit IRC19:10
anteaya24 gate, 1 post, 53 check19:10
anteayaalmost manageable again19:11
*** wenlock has quit IRC19:11
*** CaptTofu has quit IRC19:14
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add a mirror repack cron to git servers
jeblairI'm going to run the repack on all of the git servers, then eat.19:15
hartsocksHi. I think my account on is screwed up. Is this the correct channel for that?19:16
clarkbhartsocks: yes, what problem are you seeing19:17
clarkbthe food is a lie. I need to wait a little longer on it19:17
anteayajeblair: sounds good19:17
anteayaclarkb: k19:17
hartsocksclarkb: Example:
hartsocksThe review system has decided I don't own about half my patches. Sometimes I show up as "hartsock" and other times as "hartsocks" I don't know why. Who do I ask about this? I've tried to get help on the mailing list in the past.19:18
anteayaclarkb: does the db have both a hartsock and a hartsocks?19:18
clarkbanteaya: checking19:18
hartsocksmy preference would be to fold everything into 'hartsocks'19:19
anteayahartsocks: great19:19
clarkbhartsocks: yeah you have two accounts19:20
hartsockscan you just fold everything to hartsocks?19:20
anteayahartsocks do you have any repos with connections to gerrit? you might need to delete the remote branch and create a new remote branch to gerrit with `git review -s`19:21
anteayato ensure you don't have any headed for hartsock19:21
reedstupid mailman19:21
*** thomasbiege has joined #openstack-infra19:21
clarkbhartsocks: first a little background on why this appears to have happened19:21
clarkbhartsocks: you have logged into gerrit with two different launchpad accounts19:22
anteayareed an app called mailman or the human being holding your mail?19:22
clarkbhartsocks: and if you push code with two different usernames changes will be attached to two different accounts19:22
hartsocksclarkb: whoops :-/19:22
clarkbhartsocks: if you want to be hartsocks you should login with your vmware launchpad account19:22
reedanteaya, the python code that delivers email19:22
hartsocksclarkb: The only actions have been git actions that seem to screw up.19:23
clarkbactually I take that back both acocunts see acm and vmware email19:23
anteayareed: ah okay, stupid python code that delivers email19:23
*** SergeyLukjanov has quit IRC19:23
hartsocksclarkb: I must have a git repo that was set up 'hartsock'19:23
clarkbhartsocks: that will do it19:23
hartsocksclarkb: I will go through them all and make sure they are 'hartsocks'19:23
clarkbhartsocks: you can set gitreview.username in your global git config to set it globally19:23
clarkbhartsocks: then make sure you don't have any local overrides19:24
hartsocksclarkb: I'm guessing that's in .git/config locally19:24
anteayahartsocks: yes19:24
clarkbhartsocks: ~/.gitconfig but setting it with the git config command is preferred. `git config --global gitreview.username hartsocks`19:24
reedpleia2, mordred, jeblair: when you approve my message to infra mlist please whitelist also stefano+infra@openstack  as allowed email19:25
hartsocksclarkb: thanks19:25
pleia2reed: will do, sec19:25
clarkbhartsocks: rolling stuff under the other name into hartstocks is probably possible, but this is a busy week and if you can live with those being wrong until they get merged or die that would probably be easiest19:25
clarkbI am also not sure if we have updated changes and the like in the past19:26
clarkbmay not be possible19:26
hartsocksclarkb: now that I know what's going on I can live with that for a while.19:26
hartsocksclarkb: just want my karma points that's all :-)19:26
anteayamy local .git changes are in .git/config and I put them there with the git config command19:26
*** thomasbiege has quit IRC19:27
hartsocksclarkb: (I know the points don't matter.)19:27
anteayajust like on Whose Line19:27
*** ruhe has quit IRC19:27
*** xBsd has joined #openstack-infra19:28
bnemecDibs on being OpenStack's Ryan Stiles.19:29
bnemecI've even got the requisite height. ;-)19:30
anteayaas a Canadian I'd like to try for Colin Mocherie19:30
anteayabut my gender might be a hinderance19:30
anteayaand I'm not bald19:30
bnemecIt's all good - half the time they had him playing female characters anyway. ;-)19:31
bnemecAlthough the inability to make bald jokes would definitely be a problem. :-D19:31
anteayaI can wear a swim cap19:32
*** vipul is now known as vipul-away19:32
anteayaI'm out sick for the richard simmons episode though19:32
bnemecBah, what fun is that? :-P19:33
anteayait's all you Ryan19:33
anteayapleia2: are you still lunching?19:34
anteayaI'm trying to find a space for some exercise19:34
pleia2anteaya: I'm back-ish :)19:35
anteayaI can wait19:35
anteayalet me know when you are back19:35
*** xBsd has quit IRC19:35
*** HenryG has quit IRC19:36
pleia2anteaya: I'm back19:36
anteayaokay great19:36
*** sarob has joined #openstack-infra19:37
anteayathanks, off for a walk I expect to be back in an hour19:37
*** yolanda has quit IRC19:38
*** sdake_ has quit IRC19:39
jeblairokay, pack-refs has completed on all the git servers19:40
*** sarob has quit IRC19:41
jeblairreal    0m40.868s19:42
jeblairclone time for nova on 0319:42
*** emagana has joined #openstack-infra19:43
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
clarkbjeblair: nice19:43
* clarkb reviews the change to put that in place everywhere19:43
jeblairclarkb: are you back and ready to proceed, or killing time around lunch?19:44
clarkbjeblair: I should technically kill more time around lunch because the food I smell hasn't made it to the scavenging grounds yet19:45
clarkbjeblair: but I am also impatient. I think we should continue if you don't need more time for food19:46
*** wenlock has joined #openstack-infra19:47
clarkbjeblair: I am fetching your cron change into the puppet development env19:47
jeblairclarkb: we can wait, i think we're getting to the point where we don't want to be interrupted19:47
clarkbjeblair: ok19:47
clarkbI will pull the change into that repo and possibly just find a sandwich19:48
clarkbto speed things along19:48
*** p5ntangle has quit IRC19:48
clarkbjeblair: two things to note before I afk for a few minutes. The haproxy git:// queue and conn numbers may need changing and we may need to change the default balance type to source to accomodate lag in replication across the different servers19:49
*** p5ntangle has joined #openstack-infra19:49
clarkbjeblair: the current balance method is round robin and git http by default can open up to five connections.19:49
*** SergeyLukjanov has joined #openstack-infra19:51
jeblairclarkb: we have cacti graphs for 01-0419:53
*** gyee has quit IRC19:57
*** CaptTofu has joined #openstack-infra19:58
*** sarob has joined #openstack-infra20:00
*** pcm_ has quit IRC20:00
*** nayward has quit IRC20:01
*** dina_belova has quit IRC20:03
*** dina_belova has joined #openstack-infra20:04
*** sdake_ has joined #openstack-infra20:06
*** sdake_ has quit IRC20:06
*** sdake_ has joined #openstack-infra20:06
*** nati_uen_ has joined #openstack-infra20:09
clarkbjeblair: woot.20:09
clarkbjeblair: sandwich was good. ready whenever you are20:09
jeblair1 sec20:09
jeblairso shall we merge the git:// change now?20:11
*** p5ntangle has quit IRC20:11
jeblairclarkb: i'll let you do that since you haven't reviewed it20:11
clarkbjeblair: the gate is undergoing a reset. should we wait a little bit for that?20:11
clarkbor just power through?20:11
*** nati_ueno has quit IRC20:12
jeblairclarkb: power through20:13
jeblairit'll be done by the time that gets merged20:13
clarkbok merging 43315 now20:13
*** ^demon|lunch is now known as ^d20:13
clarkbthe zuul results queue is large again20:15
clarkbbut that may just be a side effect of cancelling a bunch of stuff20:15
jeblairclarkb: yep20:15
*** lcestari has quit IRC20:22
*** dina_belova has quit IRC20:25
*** SergeyLukjanov has quit IRC20:25
*** vipul-away is now known as vipul20:25
*** jbjohnso has joined #openstack-infra20:26
*** nati_uen_ has quit IRC20:26
*** danger_fo_away is now known as danger_fo20:27
clarkbjeblair: still waiting to get queued. Should I go ahead and force submit the change?20:27
jeblairclarkb: yeah, it's about 2/3 through reconfiguring the reset.  let's not wait.20:28
openstackgerritA change was merged to openstack-infra/config: Switch ggp to use git://
clarkbjeblair: I am going to run a puppet agent --noop20:29
*** sarob_ has joined #openstack-infra20:29
clarkbas a quick sanity check but then we should be ready to apply the haproxy stuff to git.o.o20:30
clarkbjeblair: it looks clean to me. should I go ahead and run puppet for real? are you ready?20:32
*** sarob_ has quit IRC20:32
*** sarob_ has joined #openstack-infra20:32
*** sarob has quit IRC20:33
jeblairclarkb: yep20:33
* clarkb pushes the go button20:34
clarkbjeblair: can I have you check the ip6tables rules after puppet is one on git.o.o? I noticed some weirdness there yesterday and think our iptables module may not be completely happy on centos20:34
clarkbpuppet is still running. I will let you know when to check20:35
jeblairclarkb: what was weird?20:37
clarkbjeblair: it didn't pick up the new 4443 29418 and 8080 rules. but I kicked it by hand and that seemed to work20:37
jeblairthat seems to be the case again.  probably a puppet bug20:37
jeblairclarkb: how's that puppet run?20:38
jeblairwe're starting to fail jobs20:38
clarkbpuppet is done running. haproxy is up20:39
* clarkb checks a local clone really fast20:39
clarkblocal clone of nova via git:// works20:39
jeblairi just did an https clone from home20:39
* clarkb looks in the haproxy log for anything crazy looking20:40
jeblair(i'm cloning zuul, not nova though so i don't impact the server)20:40
jeblairgit and http work too20:40
jeblairclarkb: how do we examine haproxy state?20:41
clarkbjeblair: /var/log/haproxy.log20:41
clarkbjeblair: is the log20:41
pleia2git is still running from xinetd, right?20:41
clarkbI think it opens a socket somewhere that you can talk directly to asw well /me finds that20:41
jeblairany way to see the current connection count, distributions20:41
clarkbpleia2: no this includes your daemon change20:41
pleia2clarkb: oh ok, great20:41
clarkbjeblair: good question. I am looking for that socket now20:42
*** pblaho has quit IRC20:42
clarkbjeblair: on /var/lib/haproxy/stats20:43
jeblairpleia2: would you mind writing a change to add the 'socat' package to the git servers?20:44
*** woodspa has quit IRC20:44
jeblairi installed it manually on git.o.o20:44
pleia2jeblair: sure, on it20:44
clarkbjeblair: out of curiousity what command(s) are you running against that socket?20:44
*** danger_fo is now known as danger_fo_away20:45
jeblairclarkb: this looks useful
jeblair(i've pasted in more output into the etherpad)20:47
psedlakhi, how is the issue with python-*client dependency collisions solved for stable/grizzly branch? i've tried to get similar env at my machine and nova failed to start at all due to wrong versions of keystoneclient ... :/20:49
psedlak*similar env as gate-devstack-tempest-vm-full for stable/grizzly20:49
*** jjmb has joined #openstack-infra20:49
openstackgerritElizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add socat package to cgit servers
anteayapsedlak: we are doing a little internal work now, and the ones able to answer your question need to focus on their fix right now20:51
*** dkranz has quit IRC20:51
anteayapsedlak: if you have a link to a bug report or patch I can look at it, if you want20:51
clarkbjeblair: are you seeing any rampant failure?20:51
clarkbjeblair: best as I can tell we are mostly up20:51
psedlakanteaya: you mean it's not best time for it now ... should i ask later/tomorrow?20:52
anteayayou can try20:52
anteayahave you tried in -dev or -nova yet?20:52
anteayathe keystone folks hang out in -dev20:52
jeblairclarkb: nope, afaict, we seem to be distributing across all servers20:52
jeblairecho "show errors" |socat stdio /var/lib/haproxy/stats20:52
jeblairis empty20:52
*** jjmb1 has joined #openstack-infra20:53
psedlakanteaya: no, not yet as on gate it reinstalls (at least keystoneclient, but maybe also others) multiple times during setup (devstack) ... and there are clearly incompatible reqs
jeblairclarkb: since we broke the gate queue during the hup, that actually stopped the gate reset20:54
jeblairpsedlak: in master, we are now forcing the requirements specified in openstack/requirements to be installed20:54
clarkbjeblair: there are a handleful of could not get file errors due to the lack of no .git to .git translation but far fewer than we seemed to have in the past20:54
anteayapsedlak: yeah, let's move to -dev and see if some -qa folks are around20:54
*** jjmb has quit IRC20:54
jeblairpsedlak: i believe devstack has code to do that; i'm not sure if all of that has been backported to grizzly yet, but it's under consideration at least (if it hasn't been done)20:54
jeblairpsedlak: you might ask dtroyer20:55
jeblairclarkb: it's possible those errors are due to a smart http request failing during the hup20:55
psedlakjeblair: ok, thanks20:55
clarkbjeblair: that is possible20:56
jeblairclarkb: maybe give it a few mins and if they continue, start to worry? :)20:56
* jeblair adds a new tree to cacti20:56
clarkbjeblair: can you add one for logstash + elasticsearch if you are collapsing things together?20:57
*** thomasbiege has joined #openstack-infra20:57
jeblairclarkb: let me do that later; i'm just going to add a quick collection of graphs for git now; later i'll add a single graph that graphs multiple hosts; i'll do logstash then too20:57
clarkbok wfm20:57
*** thomasbiege has quit IRC20:57
*** mrmartin has quit IRC20:58
*** apcruz has quit IRC20:59
anteayaa few cpus finally taking a smoke break21:01
*** gyee has joined #openstack-infra21:01
jeblairanteaya: heh, ie, smoking less?21:01
anteayayeah, taking a break from smoking21:02
*** hartsocks has left #openstack-infra21:02
anteayawhat a difference21:02
anteayathe greens are so similar, they aren't even in nice, they went right to idle21:03
jeblairanteaya: you should be able to tell the difference on the graph; if you look at zuul, youll see nice time.21:03
*** CaptTofu has quit IRC21:04
jeblairbut yeah, nothing is nice here21:04
clarkbI think the gate reset is part of it21:04
*** jjmb1 has quit IRC21:04
anteayayeah look at those idle numbers21:04
jeblairclarkb: yes, git is basically idle now21:04
anteayaclarkb: okay, I'll see if I can see what happens on a gate reset21:04
*** CaptTofu has joined #openstack-infra21:04
jeblairwe're back at one job at a time until the next reset21:05
anteayaone job? one what kind of job - one git clone job?21:06
anteayalooks like a gate reset coming up21:07
jeblairanteaya: well, we don't usually clone things, but yes, since all nodes are now occupied, they will each just pick up a new jenkins job (which will perform some git action) one at a time as they finish21:07
anteayaah okay, I think I understand21:07
jeblairanteaya: without a new error, we're 17 minutes away from a gate reset21:07
*** sarob_ has quit IRC21:08
jeblairopenstack/nova 42435,7 is the first change with an failed job in the gate (at the moment)21:08
anteayaokay, can I see that on
anteayaah okay21:08
*** sarob has joined #openstack-infra21:08
*** CaptTofu has quit IRC21:08
anteayaright, a failed voting job21:09
anteayaI see it21:09
*** changbl has quit IRC21:10
anteayawhat is expected to happen at the next gate reset?21:10
jeblairanteaya: zuul will cancel any running jobs in the gate queue which will free many jenkins slaves at once to immediately start running new gate jobs which will stress the git server21:11
anteayaah ha21:11
anteayathen we will see what happens21:11
jeblairthen we'll see how the load-balanced server performs under our current load21:11
jeblairif it performs well, we can add more nodes; if it does not, we can add more git servers21:11
anteayaso 13 minutes of downtime for you21:12
anteayaor maybe a smoke break?21:12
anteayaor maybe not quite the stress load21:12
*** sarob_ has joined #openstack-infra21:12
*** sarob has quit IRC21:13
anteaya8 minutes21:13
*** sarob_ has quit IRC21:13
jeblairwell, perhaps a few minutes to switch to the other desktop and check in on the nodepool change i'm working on21:13
*** sarob has joined #openstack-infra21:13
clarkbwe might also need to tune the maxconn settings for git://21:13
anteayajeblair: :D21:13
jeblairclarkb: this is one of those times i miss gerritbot reading merges in dev21:13
clarkbjeblair: ya21:13
anteayalike a mini vacation21:14
clarkbjeblair: I have been watching the post queue for that info now21:14
jeblairclarkb: we just merged 13 changes in the past 8 minutes21:14
anteayahere is a graph of merged changes:
clarkbfatal: git upload-pack: not our ref 39f1e9314ee28eed74cdaf3c447fc32a64e76f45 multi_ack_detailed side-band-64k thin-pack no-progress include-tag ofs-delta21:15
clarkbI think ^ may be related to non atomic mirror replication21:15
* clarkb looks in the error log of the other servers21:15
jeblairclarkb: ya, where'd you see it?21:15
clarkbjeblair: that is on git.o.o and git02 has a couple as well21:16
clarkbgit01 is clean21:16
clarkb03 is clean21:16
clarkb04 as well21:16
clarkbso not common at least not under heavy load21:17
anteaya3 minutes to gate reset21:17
clarkbjeblair: we can try switching to source balancing which may suck with the d-g slaves as they are all in similar network space, or add retries to our git stuff21:17
jeblairclarkb: what's the mask on source balancing?21:18
jeblairclarkb: why not go with the full 32?21:18
clarkbjeblair: I don't know that you can provide the mask21:19
clarkbI will look into it more closely21:19
anteayagate is resetting21:19
*** AJaeger has quit IRC21:19
clarkbjeblair: also note that reload in the haproxy init script should be mostly invisible to the clients21:19
jeblairclarkb: excellent21:20
*** vipul is now known as vipul-away21:22
*** nati_ueno has joined #openstack-infra21:22
*** boris-42 has quit IRC21:23
openstackgerritA change was merged to openstack/requirements: Allow use of oslo.messaging 1.2.0a10
openstackgerrit@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None
jeblairclarkb: l makes it look like like it considers the whole ip21:23
clarkbjeblair: yeah I am beginning to think that too. Looking in the source they use a hash over 32bit space with good distribution21:24
clarkb(accordng to the comments anyways)21:25
clarkblet me see if we can make the change with puppet (depends on whether or not it uses reload vs restart)21:25
*** dina_belova has joined #openstack-infra21:25
jeblairgate just reset21:26
clarkbmight be a little while before we reenable puppet though so I am open to doing it by hand if you want to get it in21:26
jeblairclarkb: may as well see how this reset goes, no rush21:26
anteayawhen the patches in the gate pipeline change to unknown - that is the indicator that the gate is reset?21:27
jeblairoh, this is still markmc's chain, so it has to kick a bunch of changes out first before it actually starts jobs21:27
jeblairanteaya: there are no running jobs currently, it has canceled everything and is recomputing the new proposed series to merge21:27
anteayaokay, how do I see that using the status page, cacti and graphite?21:28
anteayaor can I?21:28
*** dina_belova has quit IRC21:28
jeblairanteaya: the status page; if you look at the gate queue, you should see that nothing has started running yet21:28
anteayaright, but all the old jobs with any logs are no longer in the queue21:29
anteayaso that can be my indicator21:29
*** dprince has quit IRC21:30
*** krtaylor has quit IRC21:31
*** krtaylor has joined #openstack-infra21:32
jeblairok it's starting jobs now21:35
anteayayes I see that21:35
*** dina_belova has joined #openstack-infra21:35
anteayaand cpu usage for user on the git server is 1.721:36
anteayaI don't see a spike21:36
jeblairanteaya: there's a 5 minute polling interval on graphite21:36
anteayaah ha21:36
anteayaI'll check back in 5+ minutes21:36
*** mriedem has quit IRC21:36
anteayatime for toast21:37
*** dina_belova has quit IRC21:40
anteayaso the jobs that stress the git server are any job with devstack in it, correct?21:41
clarkbanteaya: they are the worst offenders21:41
anteayaah okay21:41
*** vipul-away is now known as vipul21:42
anteayaso far on my cacti graph user is up to 421:42
anteayawith idle at 92.821:43
anteayanice ratio21:43
clarkbI am not seeing terrible load average on the individual servers21:43
anteayaany numbers for the etherpad?21:43
clarkbnot yet. I am not sure that the full wave has hit us yet21:44
*** pblaho has joined #openstack-infra21:44
anteayabut good early results21:44
clarkbload average: 0.39, 0.45, 0.43 on git.o.o these numbers are on cacti too21:44
* anteaya scrolls down21:45
*** ftcjeff has quit IRC21:45
*** Ryan_Lane has quit IRC21:47
*** Ryan_Lane has joined #openstack-infra21:47
jeblairclarkb: i have a disturbing thought; what if nova was the only repo on git.o.o that had packed refs?21:48
clarkbjeblair: hahahahaha21:48
clarkbwell it seems happy now in any case :)21:49
jeblairclarkb: if that graph holds, then the inflection point of load dropping on git.o.o is much closer to the point where i ran pack-refs than when we started the other servers21:49
clarkbjeblair: makes sense21:49
clarkbif that is the case we can always scale back the additonal nodes21:50
jeblairwell, if so, maybe we can just throw more load at it sooner.  :)21:50
clarkbor that21:50
jeblairi see a lot of graphs on the status page that should have passed the point of git errors by now; and there are basically no git connections21:51
jeblairso i think we've seen as much 'rush' from this reset as we're going to21:51
clarkbjeblair: I agree. I do however think we should switch to source balance method21:51
jeblairclarkb: yep, let's do it now before the next rush?21:52
jeblairclarkb: and then perhaps unstick az2 and give nodepool its reigns again?21:52
clarkbya I am checking puppet now and if puppet is sane will do it with puppet and if it isn't sane will do it by hand and update puppet21:52
clarkbjeblair: ++21:52
clarkblooks like it iwll use restart21:53
clarkbI will edit the file by hand, reload haproxy then do puppet so puppet doesn't see the change21:54
jeblairis it haproxy.cfg?21:54
clarkbjeblair: yes21:54
clarkbin /etc/haproxy/21:54
*** dmakogon_ has quit IRC21:54
jeblairmgagne: want to write a puppet patch?21:55
mgagnejeblair: go on21:55
*** ^d has quit IRC21:55
jeblairmgagne: it would be cool if changes to haproxy.cfg could run '/etc/init.d/haproxy reload' instead of 'restart' in the puppetlabs haproxy module
jeblairclarkb: do you think that makes sense?  or are the times when you'd want to reload vs restart significant enough that there isn't a clear winner?21:56
*** jjmb has joined #openstack-infra21:56
anteayacurrently no failures in the gate queue/pipeline21:57
jeblairmgagne: restart is disruptive to clients, reload is not, and you do things like 'add new backend servers' by editing that file21:57
mgagnejeblair: are you suggesting setting $manage_service to false and handling the definition of the haproxy service within the node manifest?21:57
*** mrodden has quit IRC21:58
clarkbI think that makes sense. But I don't know enough about haproxy to know if one is preferred over the other in some instances21:58
jeblairmgagne: well, either that, or make the puppetlabs module better; clarkb what do you think?21:59
*** sdake_ has quit IRC21:59
mgagnejeblair: according to my coworker, reload is preferred. If restart is used and config contains an error, you are screwed, haproxy won't restart. restart will kill all the connections, reload won't.22:00
openstackgerritClark Boylan proposed a change to openstack-infra/config: Use the haproxy source balance method.
clarkbmgagne: yeah that is why we want reload. it should be much more invisible to end users22:01
clarkbI wrote 43359 so I can see what the puppet concat diff looks like before modifying the file22:01
clarkbjeblair: once I have reloaded haproxy we should merge these puppet changes22:01
mgagnejeblair: depends on your urgency: designing and proposing a patch, having it accepted, releasing on forge won't happen in one day22:01
*** prad_ has quit IRC22:02
clarkbmgagne: understood. we will work around it now. But it is something that will probably end up being desirable to us and others22:02
clarkbat the very least I suppose I hsould open a bug with puppetlabs22:02
mgagneclarkb: yes, bodepd could use his contact to fast-forward the patch ;)22:02
jeblairclarkb: also, hunner looks like he's involved in that22:03
mgagneclarkb: it will be useful to us too as we are dealing with haproxy tuning atm22:03
clarkboh I could just bug hunner22:03
anteayathe entire gate queue/pipeline has some test jobs running, so far no failures22:03
mgagneclarkb: yes, hunner is the man22:03
clarkbmgagne: are you puppetconfing?22:03
anteayafirst failure is on the last (27th) patch22:04
mgagneclarkb: could you make the scope of your question smaller? :D22:04
jeblairanteaya: and it's a real test question22:04
jeblairanteaya: and it's a real test failure22:04
* jeblair just writes what he reads22:04
anteayayes, a voting job22:04
mtreinishanteaya: does that include the testr-full jobs too?22:04
clarkbmgagne: are you at the conference?22:04
clarkba bunch of folks are there22:05
mgagneclarkb: not me =)22:05
clarkbjust curious if you were part of the bunch22:05
anteayamtreinish: testr jobs are running22:05
pleia2I have a dr appt to run off to, bbiab22:05
mtreinishanteaya: yeah, but they're not voting. I was curious if you've seen random failures there. (since they wouldn't trigger a gate reset)22:06
clarkbthe way puppet concat works is weird. I am not entirely sure that merging that puppet change won't cause an ha proxy restart22:06
mgagneclarkb: I don't use puppet for client products, only internal stuff, mainly openstack. So they sent the ones designing products with puppet =)22:06
jeblairgrenade test failed:
clarkbjeblair: but I figure I should write the change locally, reload haproxy then worry about the restart later22:06
anteaya33 minutes gate-grenade-devstack-vm failed22:06
anteayasee ya pleia222:06
jeblairbut that's also a real test failure, not an infra failure)22:06
anteayamtreinish: yes, so far testr jobs are running, not results back yet in the grouping22:06
mtreinishanteaya: ok thanks22:07
jeblairi really need to mask aborted test results in zuul22:07
clarkbjeblair: reloading haproxy now22:07
mgagneclarkb: haproxy will be "notified" if haproxy.cfg is regenerated:
anteayajeblair: yay22:07
clarkbmgagne: yeah and I think concat ends up building it from scratch22:07
clarkbmgagne: but I think it checks a diff maybe22:08
clarkbhaproxy reloaded22:08
anteayajeblair: now I understand your prior question, I don't know how to open test logs reporting failure when the patch is still in the queue22:08
anteayait takes me to jenkins and then I can't get to the log itself22:09
mgagneclarkb: it concats a bunch of fragments using a bash script:
clarkbanteaya: click on "console log" on the left hand side in jenkins22:09
anteayaclarkb: thanks22:09
clarkbjeblair: If you are happy with that stack of changes I think you can approve them now22:10
clarkbthen we can reenable puppet on the servers22:10
anteayahere is a python26 error and it looks like a real error, not a git timeout:
jeblairclarkb: including source?22:10
*** weshay has quit IRC22:10
clarkbjeblair: yes including source22:10
clarkbjeblair: I will just be careful when I start puppet again... I am not sure there is much we can do there22:11
clarkbI could move the init script aside :)22:11
anteayamtreinish: here is a testr failure for a swift patch:
mtreinishanteaya: thanks I was just looking at it. It looks like one I've seen before where all the server creates in nova go to an error state22:12
clarkbyou can definitely see it is no longer roudn robinning requests if you tail the log22:14
anteayahave a nova patch failing both 26 and 27, look like real failures - 23 minutes until if finishes22:14
clarkbanteaya: link to py2622:14
anteayathe patch passed both in the check queue22:15
clarkbyup real failure22:16
anteayaI see those as being actual python failures, not git timeouts22:16
anteayayay, my log parsing skills are getting better22:16
anteayafunny they passed in check22:16
*** burt has quit IRC22:16
jeblairclarkb: did you confirm whether smart http client is one connection?  if so, do you want to round-robin it?  or shelve this topic until we have more graphs for 'source'?22:18
clarkbpleia2 mind checking cgit?22:18
jeblairclarkb: i think she's afk22:18
clarkbjeblair I think shelve22:18
jeblairclarkb: wfm22:18
clarkbjeblair thanks22:18
anteayaat 14 minutes we have a postgress failure:
anteayashe is at a dr appointment22:20
mgagneclarkb: trying to see if puppet service resource supports reload. But I'm always finding puppet bugs that have been opened for years without patch or conclusion...22:21
clarkbmgagne: I think you have to give ita restart command or something like that22:21
clarkbwhere puppet intends to `restart` but you have told it to do something else22:22
mgagneclarkb: yes, which (IMO) is suboptimal22:22
clarkbmgagne: I agree22:22
*** pblaho has quit IRC22:22
anteayathe postgres error is from  nova patch 28819,322:22
jeblairclarkb: shall i unstick az2 nodepool now?22:23
jeblairanteaya: it's not an infra error22:23
anteayajeblair: yay22:23
anteayaso far, no infra errors in the gate22:23
mgagneclarkb: is haproxy actually restarted when the config is updated?22:23
clarkbjeblair: yes I think we can open the flood gates22:23
clarkbmgagne: --noop says the service will be restarted22:23
clarkbmgagne: let me get the exact log line22:23
mgagneclarkb: service resource has the "refreshable" feature22:23
anteayapatch which will spark a gate reset to be finished in 4 minutes22:24
clarkbmgagne: notice: /Stage[main]/Haproxy/Service[haproxy]: Would have triggered 'refresh' from 1 events22:24
jeblairclarkb: done; az2 nodes should start showing up in a few mins22:24
anteaya8 in post, hopefully 6 more to join them22:24
mgagneclarkb: we can only hope the service provider detects that the haproxy service can actually be reloaded.22:25
mgagneclarkb: I don't see any trace of reload in that file:
anteayathis patch has the nova py26 and py27 errors: it is going to remain in the queue after reset, I guess there is nothing we can do about that22:27
anteayait needs the logs from the failure attached to the patch and it won't get them otherwise22:27
clarkbanteaya: yeah that is normal22:28
jeblairclarkb: the first new az2 node is in use, it appears to be running a job22:28
clarkbmgagne: ok, I think I will just try starting puppet again on that server when jenkins is quiet22:28
clarkbmgagne: that way if it restarts it doesn't hurt a lot of stuff and we know about it. Otherwise \o/22:28
jeblairclarkb: you mean in november? :)22:28
clarkbjeblair: Friday afternoons are usually sanish22:28
clarkbof course this is no normal week22:28
mgagneclarkb: thanks for asking about reload, now I have to fix haproxy to reload with my setup =)22:29
anteayathat git.o.o cacti graph just looks beautiful22:30
openstackgerritA change was merged to openstack/requirements: Allow pyflakes 0.7.3
openstackgerritA change was merged to openstack-infra/config: Swap git daemon in xinetd for service
anteaya10, 10 pretty patches in post ah ha ha ha *lightening flash*22:32
clarkbIt feels like we are moving again22:33
anteayalook at that graph of test nodes climb22:34
openstackgerritA change was merged to openstack-infra/config: Load balance git requests.
*** dina_belova has joined #openstack-infra22:36
openstackgerritA change was merged to openstack-infra/config: Add a mirror repack cron to git servers
clarkbjeblair: the time remaining numbers when you hover over the progress bars on the status page don't add hours properly22:39
clarkbjeblair: you can see that now if you look at the gate tempest jobs. I intend on taking a look at that when things are not so busy if no one else beats me to it22:39
openstackgerritA change was merged to openstack-infra/config: Use the haproxy source balance method.
*** dina_belova has quit IRC22:41
jeblairclarkb: thx; yeah, i _think_ the bug is in status.js22:41
anteayaclarkb: just seems to be the ones in the gate, check and post seem reasonable22:41
jeblairclarkb: also, it needs to round better; anything < 60 seconds is 0min22:42
clarkbanteaya: yeah it has to do with jobs that roll over an hour in length22:42
clarkbanteaya: we keep the hour set to 0022:42
anteayawhat happens if you just go with minutes and get rid of hours22:42
anteaya90 minutes rather than 1 hour 30 minutes22:43
clarkbanteaya: humans don't like reading timestamps like that22:43
anteayaI can live with it22:43
anteayabut other humans, okay22:43
anteayamovie running times are all like that22:43
anteaya120 minutes22:43
anteaya200 minutes22:43
anteayagate reset22:44
anteaya12 in post!22:44
anteayalook at the test node numbers climb22:44
mgagnefeels like a sport commentator =)22:44
anteayaI have to do something22:45
anteayadon't know enough to write any scripts to do any helpful changes22:45
anteayaI would have to ask questions, slows them down22:45
anteayaI'll learn more when it is quieter22:46
wenlockhey guys, grats on getting your current challenge fixed... was wondering if i could ask a few questions ...  ive been working on trying to understand puppet and using wiki22:47
jeblairclarkb: a noticable bump in the git cpu graphs22:49
jeblairwenlock: what's your question?22:49
clarkbjeblair: we still seem to be under control though22:51
jeblairclarkb: yep, seems well within capabality atm22:51
anteayaclarkb: here is a bug report for you:
clarkbjeblair: I am going to enable puppet on 01-04 since all of the outstanding changes that affect them have merged22:52
uvirtbotLaunchpad bug 1215659 in openstack-ci "zuul status bars hover box "time remaining" fails after 61 minutes" [Undecided,New]22:52
clarkbjeblair: I will hold off on git.o.o until I can do it semi safely22:52
jeblairclarkb: ok22:53
*** mrodden has joined #openstack-infra22:54
clarkbin other news I think the kicking out of changes that may not merge is the greatest thing ever22:54
jeblairThis change was unable to be automatically merged with the current state of the repository and the following changes which were enqueued ahead of it: 31061, 41723, 42430, 42431, 43088, 42751, 42746, 42744, 42743, 41070, 42745, 42765, 42747, 42432, 42433, 42434, 42435, 42436, 42437, 42748, 42749, 42750, 42752, 40845, 37465, 38601. Please rebase your change and upload a new patchset.22:55
jeblairclarkb: ^ you mean like that? :)22:55
jeblairthere's a merge conflict in there!  somewhere!22:55
clarkbjeblair: ya :)22:55
clarkbI think the choice to sacrifice the few for the many was the correct one22:56
wenlocki setup wiki on a private server, using the wiki.pp module  it installed ok, but seems only mysql is started22:56
wenlockis there some additional modules that control started state?22:57
clarkbgate throughput is much higher now in the best case scenario22:57
wenlockor should i have expected to see a running server on port 80?22:57
jeblairclarkb: the needs of the many outweigh the needs of the few (or the one).22:57
anteayalook at all those recent merges:
jeblairwenlock: unfortunately, some parts of the wiki servers aren't in puppet :(22:57
jeblairwenlock: i believe Ryan_Lane is planning on working on that when he gets a chance22:58
jeblairwenlock: but i think at least some of the config is just on-host22:58
Ryan_Lanevery little of it is just on-host22:58
jeblairwenlock: however, we do have some documentation about how upgrades are manually performed22:58
Ryan_Lanejust the mediawiki software and its config22:58
Ryan_Laneeverything else is in the module22:58
jeblairRyan_Lane: ah ok22:59
jeblairwenlock: that upgrade documentation might be able to serve as install configuration too22:59
jeblairwenlock: that upgrade documentation might be able to serve as install documentation too22:59
wenlockok, cool... thats making a little more sense now :D23:00
*** datsun180b has quit IRC23:00
clarkbpuppet is running on 01-0423:01
clarkbnow I will check cgit23:01
clarkbcgit seems happy23:02
anteayai don't see any failures in the gate queue/pipeline yet23:02
clarkbjeblair: you write a lot of commits apparently23:03
*** rnirmal has quit IRC23:03
mgagneat least I'm on the list ^^'23:03
*** notmyname has quit IRC23:04
clarkbI wonder if that is counting patchsets23:04
*** notmyname has joined #openstack-infra23:04
anteayamgagne: I'm just hoping I am in other somewhere23:04
* clarkb looks in status.js to focus on something different for a bit23:05
*** pcrews has quit IRC23:06
mgagneclarkb: how about upgrading apache puppet module to latest version :D /jk23:06
clarkbmgagne: you arefunny23:06
*** wenlock has quit IRC23:07
anteayamgagne: yay I'm on the list, thanks23:07
anteayaclarkb: did you see this?
uvirtbotLaunchpad bug 1215659 in openstack-ci "zuul status bars hover box "time remaining" fails after 61 minutes" [Undecided,New]23:07
anteayaor did it get lost in the blur?23:07
*** pabelanger_ has quit IRC23:07
clarkbanteaya: I did thanks23:07
clarkbit popped up in my email which is what prompted me to look a tit23:08
clarkbthat is an unfortunate typo23:08
anteayalet it pass23:08
*** pabelanger has joined #openstack-infra23:08
anteayadid you ever get real food, clarkb?23:08
*** notmyname has quit IRC23:08
anteayaor are you still running on sandwich?23:09
clarkbanteaya: sandwiches are real food23:09
*** notmyname has joined #openstack-infra23:09
anteayathat they are yes, I was referring to the aromatic food that was cooking earlier23:09
clarkbjeblair: I think I see the bug in status.js23:09
* anteaya lives on sandwiches herself23:09
*** jhesketh has quit IRC23:11
*** sdake_ has joined #openstack-infra23:12
*** jhesketh has joined #openstack-infra23:14
*** _TheDodd_ has quit IRC23:14
pleia2clarkb: back, lmk if you still need tests23:15
clarkbpleia2: I think we are good23:16
openstackgerritClark Boylan proposed a change to openstack-infra/config: Fix zuul status hours display.
clarkbjeblair: anteaya ^23:16
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Fix error with stats for de-configured resources
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Make jenkins username and private key path configurable
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Move setup scripts destination
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Change credentials-id parameter in config file
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Reduce timeout when waiting for server deletion
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Add option to test jenkins node before use
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Add JenkinsManager
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Add an ssh check periodic task
jeblairclarkb: something about a lot of patches?23:17
clarkbjeblair: ya you write them :)23:17
pleia2whee :)23:17
*** mriedem has joined #openstack-infra23:18
jeblairclarkb: so that adds the node test feature; it's completely optional, and i'm not sure i want to use it, but i figured it'd be good to get that lever in place in case we want to pull it23:18
jeblairclarkb: i'm actually more leaning toward thinking that getting zuul to re-run jobs that come back with jenkins exceptions is the way to go, and i think we can do that without a change to the gearman plugin23:18
jeblairclarkb: but i'll go ahead and write up the jjb change to populate the node test job so it'll be there if we want it23:19
*** mrodden1 has joined #openstack-infra23:19
*** mrodden has quit IRC23:20
clarkbsounds good. I may take a break shortly to do something other than type in a terminal. But plan to do some code review after that23:23
anteayaif we are re-running jobs that return with exceptions do we have some form of counter so it doesn't loop endlessly?23:23
clarkbI have found that code review at night is nice because there are few distractions23:23
clarkbanteaya: In this case it may be ok to loop endlessly as the failure is on th ejenkins side23:23
jeblairanteaya, clarkb: i think i would use a counter; if jenkins goes crazy i don't want everything stuck in zuul23:24
anteayamakes sense23:24
jeblairclarkb: do you think we're in a good place to add, say, 8 more centos nodes?23:24
jeblairactually, we should re-evaluate now that they should be using the git protocol23:25
jeblairthey may not be as far behind now23:25
clarkbjeblair: they are using git protocal and a spot check showed that it sped up ggp tremendously for them23:25
*** Adri2000 has quit IRC23:25
anteayawe have a LOST in the gate, 40833, 10:
jeblairanteaya: yeah, that's the situation that either the nodepool change or (hopefully, still looking into it) a zuul change would fix23:26
anteayawhat more does zuul need fixed?23:27
anteayatrying to keep up23:27
clarkbI think the LOST jobs is the last major outstanding item23:27
jeblairanteaya: the change we were just talking about with exceptions23:27
anteayayay, we finally got there23:27
anteayasorry, I will re-read23:27
clarkbwhich means I need to get into code review mode soon23:27
anteayaoh yeah, coming back from jenkins with an exception23:27
clarkbif it makes everyone feel better about this week NASDAQ halted trading today due to a technical issue23:28
anteayayou are kidding23:29
clarkbnope. for 3 hours today they shut it down23:29
jeblairclarkb: you know, the sun's magnetic polarity is reversing.  just sayin.23:29
anteayacan't imagine what it would be like on the NASDAQ tech team23:29
anteayaha ha ha23:29
anteayait happens every 11 years23:29
anteayabut yeah, 11 years ago we didn't have the reliance on tech we have today23:30
anteayathat is for sure23:30
clarkbjeblair: I joked in a different channel that their ops team must be at puppetconf23:31
clarkbanteaya: ^23:32
anteayaha ha ha23:32
jeblairclarkb: hrm, it looks like that error came back as a regular work_fail, just without a result23:32
jeblairclarkb: so not quite as nice as a work_exception, but that might still be actionable23:32
clarkbjeblair: hmm. I think jenkins is catching that and bottling it up before gearman plugin sees it23:33
clarkbjeblair: so it becomes a failed test with no result23:33
*** Adri2000 has joined #openstack-infra23:33
clarkbthere is just not enough data in the return from the job future23:33
jeblairclarkb: possibly;  but i'm also double checking that either gearman-plugin or java-gearman isn't turning that into work_fail23:34
*** jhesketh has quit IRC23:35
clarkbjeblair: does gearman plugin break the timeout plugin? there are a few jobs that seem to have run much longer than is allowed23:36
clarkbback when git was slow23:36
*** dina_belova has joined #openstack-infra23:36
jeblairclarkb: yeah, i think you're right; if gearman-plugin gets an exception, it should return work_exception23:37
clarkbjeblair: there may be info returned by the future that can be examined23:38
clarkbjeblair: you may have to grep through the console log which seems dirty23:38
clarkbor treat a failure with no result as a jenkins exception23:39
jeblairit seems weird that the result would be null23:39
*** jhesketh has joined #openstack-infra23:39
jeblairit seems accurate enough; i'm willing to do it, but it also seems tenuous23:39
clarkbto slightly change the subject, I think we should release a new zuul version if lasts night's bug fix holds up23:40
clarkbthough that bug was only in unreleased zuul so it may not be very urgent23:40
*** dina_belova has quit IRC23:41
jeblairi think it's probably time for me to write a mailing list update23:41
jeblairalong the lines of 'mostly better' still working on a few things.23:42
*** shardy is now known as shardy_afk23:42
jeblairand i guess an announcement of git.o.o (not the way i expected it to be announced)23:43
clarkbthese things happen23:43
pleia2jeblair: including git.o.o in the same post? (I don't mind writing a separate one, I was thinking about blogging about it too)23:44
*** sdake_ has quit IRC23:44
jeblairi think it actually deserves its own post, so i think i'll mention it, but i think pleia2 should also write an email about it23:44
anteayajeblair: I think there would be many happy people if there was a ml update23:44
jeblairi think i should mention it as i describe what we're doing to handle the load23:44
jeblairbut i also want people to really learn about git.o.o and how cool it is23:44
clarkbjeblair: ++23:45
jeblairand that should be its own topic/post23:45
jeblairpleia2: how does that sound?23:45
anteayayes, I agree23:45
pleia2jeblair: wfm23:45
clarkbI think if you mention it in passing to explain the mitigation of test failures that leaves the door open to give it a proper writeup23:45
pleia2I'll update the ci.o.o/git docs real quick first23:45
pleia2(I'll need clarkb to review)23:45
clarkboh ya I completely neglected to write docs on the haproxy stuff >_>23:46
*** fbo is now known as fbo_away23:46
jeblairpleia2: cool, so you'll handle the git.o.o post then, at your leisure, and i'll mention it in passing and that you'll be sending a real announcement23:46
jeblairclarkb: i haven't written nodepool docs yet either23:46
pleia2clarkb: no worries, I'm on it23:46
jeblairspeaking of which...23:46
jeblairfungi hasn't disappeared yet, has he?23:46
* clarkb waits for gerritbot to announce new change adding nodepool docs :)23:47
jeblairclarkb: ha23:47
clarkbjeblair: it sounded like today was going to busy for him23:47
clarkband that he would try to be on this evening23:47
jeblairok, but he's not on a boat yet, so he might catch this...23:47
clarkbjeblair: correct. boat is tomorrow morning23:47
jeblairfungi: for the 'run your own devstack-gate node' thing -- i need to delete all the node launching stuff from d-g....23:47
jeblairfungi: the shell scripts to actually do all the work are fairly well split out now...23:48
jeblairfungi: so there are two approaches for migrating that23:48
clarkbI am going to run home really quick so that I can do code review on the couch23:48
clarkbs/code/docs/ as appropriate23:48
jeblairfungi: 1) instruct people on how to run those scripts on a node (sort of a one-off "make this a devstack-gate node" process)23:49
openstackgerritA change was merged to openstack-infra/zuul: Make updateChange actually update the change
jeblairfungi: or 2) how to set up a local nodepool (more complicated, but you can spin up replacement nodes easily)23:49
jeblairfungi: (#2 is more or less palatable depending on whether nodepoll still works with sqlite in low-volume; that's unknown at this point)23:50
*** michchap has joined #openstack-infra23:52
*** Adri2000 has quit IRC23:53
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Add node-test job
*** rcleere has quit IRC23:55

Generated by 2.14.0 by Marius Gedminas - find it at!