Thursday, 2016-12-08

*** seroyer has joined #openstack-powervm13:29
esberglu_#startmeeting powervm_ci_meeting13:31
openstackMeeting started Thu Dec  8 13:31:39 2016 UTC and is due to finish in 60 minutes.  The chair is esberglu_. Information about MeetBot at
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.13:31
openstackThe meeting name has been set to 'powervm_ci_meeting'13:31
esberglu_Hey guys13:31
* adreznec waves13:32
esberglu_#topic status13:34
esberglu_So those runs are _slowly_ going through13:34
esberglu_The runs themselves seem to be fine13:35
thorst_how slow?13:35
adreznecSeeing some scary times out there on the queue13:35
esberglu_thorst_: I can ping you the zuul ip if you want to look13:35
thorst_I have it13:36
thorst_10 hours?13:36
esberglu_But like 10 hours13:36
adreznecDo we know what's bogging things down yet?13:36
thorst_are any actually going?13:36
thorst_the jenkins has a ton of idle VMs.13:36
esberglu_Yeah. There are 3 going through right now, 20 - 30 have gone through in the last 12 hours13:37
adreznecIt doesn't actually look like anything's been running for all that long13:38
esberglu_That's about the volume I would expect13:38
adreznecbut things have been in the queue for a long time13:38
esberglu_Its just they sit around in the queue forever first13:38
esberglu_Which means the queue just keeps getting bigger13:39
adreznecOk, so I think we need to nail down exactly what's causing the initial queuing to build up13:41
adreznecIf it's git issues, then we probably need to invest in mirrors at this point13:41
thorst_did we tell Zuul to only let 3 through at a time?13:42
thorst_wasn't there some gate in zuul about throughput?13:43
thorst_I don't really know how this could be git...what's that train of thought there?13:43
thorst_(not that a mirror is a bad idea...)13:44
esberglu_No the only zuul conf that changed was moving nova from silent to check pipeline13:44
adreznecWell we were seeing those git performance issues yesterday, and one theory was that we were hitting some kind of internal timeouts doing the clones/fetches13:45
thorst_ahh, cause zuul does some sort of clone13:45
thorst_which I don't understand...I'd have thought that was just in the Jenkins slave VM13:45
adreznecBecause we could see it attempting to do the same fetch multiple times on different PIDs13:45
esberglu_We were seeing these git fetch <change>13:46
esberglu_That seemed to just be looping13:46
adreznecNot sure we have enough data to say that concretely13:46
adreznecBut it was a theory13:46
esberglu_The only other thing that I thought it might be13:46
esberglu_There are these changes in the queue that depend on like 10 other changes13:47
esberglu_And some of the changes are having merge issues13:47
thorst_why is zuul doing this?  ssh -i /var/lib/zuul/ssh/id_rsa -p 29418 git-upload-pack '/openstack/nova'13:48
esberglu_Here's an example of one of those changes
esberglu_Not sure13:48
thorst_that process has been running for a while13:49
thorst_the commit message in zuul about why that runs is "I'll document this later"13:51
adreznecThat should never really be a particularly long-running command13:51
thorst_I suggest we kill that proc13:52
thorst_and see if we unwedge.13:52
adreznecthorst_: How long is a while?13:53
thorst_says 08:48 in the ps aux output13:53
thorst_so under 5 min.13:54
thorst_its done now.13:54
esberglu_Yeah I killed it. Another one just popped up in its place13:54
thorst_did you kill that second one?13:55
thorst_they just seem to be really slow13:55
thorst_wonder what git-upload-pack does13:56
thorst_needs some investigation, because I don't think a clone would help that...13:56
thorst_well...when in doubt, just run by hand.13:58
thorst_it returns quite the amount of data.13:59
adreznecI think it does discovery/fetching of objects from git during a fetch14:00
adreznecNot 100% sure on that14:00 is that the status.  Figure out why we're wedged.14:02
thorst_(since we're over on time in the meeting)14:02
adreznecClearly we need longer than 30 minutes to investigate this14:03
thorst_just running the command ourselves may take 30 minutes14:03
esberglu_Yeah. Other than that I put a wiki page up for CI14:03
esberglu_If you guys want to take a look. Still need to finish a few sections and polish it up14:03
adreznecWhere did it land?14:04
adreznecNovalink wiki?14:04
esberglu_Neo dev wiki14:05
esberglu_Subpage under PowerVM CI System14:05
esberglu__WIP_ CI System and Deployment14:06
thorst_so that is also for wangqwsh as you train him to be able to redeploy the CI?14:06
thorst_excellent.  And if we do need a git mirror, that may be a good project for wangqwsh to drive14:06
openstackMeeting ended Thu Dec  8 14:07:24 2016 UTC.  Information about MeetBot at . (v 0.1.4)14:07
openstackMinutes (text):
adreznecI know we've talked about having a mirror setup before14:07
adreznecMight be worth investigating again just to take some external network load off and improve times14:08
adreznecMight even be able to serve all 3 from the same system realistically14:08
thorst_I could change the zuul server to virtio14:10
thorst_the nic is not virtio right now14:10
thorst_that may help?  But would require a shut down of the zuul server.14:11
adreznecMaybe? I guess we should look at how much traffic we're actually driving in that case14:12
adreznecvirtio is definitely faster than e100014:12
adreznecjust not sure how much bandwidth we're consuming14:12
adreznecI don't think restarting should be a huge deal if we had to14:13
thorst_its not even e100014:13
adreznecWell then14:14
adreznecMight be worth a shot then14:14
thorst_esberglu_: let me know when you want that done.  Just shut down the VM and then let me know14:14
esberglu_thorst_: Okay. I will let the current runs finish then shut it down14:15
*** seroyer has joined #openstack-powervm14:30
*** jwcroppe has joined #openstack-powervm14:31
thorst_adreznec: pulling down from is super slow14:33
thorst_to the point of stalling14:34
thorst_so may be worth trying that switch sooner rather than later14:35
adreznecWhat are we pulling from pokgsa14:35
thorst_I pulled something random14:35
adreznecAnd this is from a machine in POK?14:37
adreznecThat's crazy...14:37
thorst_well, not in same lab14:38
thorst_but similar area14:38
thorst_also just checked...its not the disk14:39
adreznecHeck, it shouldn't matter even if it was in RCH or AUS14:40
thorst_esberglu_: do we have an ETA on the shut down?14:48
esberglu_One run is still going about 50 min in.14:48
esberglu_So 20ish minutes14:48
esberglu_I can kill it if you want14:49
thorst_I can probably wait14:49
*** esberglu has joined #openstack-powervm15:39
esbergluthorst_: The mgmt server is shutdown15:40
thorst_esberglu: updating it now15:56
thorst_interesting...its stuck at 32 KB/s16:03
thorst_I wonder if we got hit by some silly qos rule16:03
thorst_anything in the POK lab itself - 50+ MB/s16:06
thorst_anything out...32 KB/s16:06
thorst_that'll kill us.16:06
esbergluYeah it will16:07
esbergluthorst_: I'm seeing 2 MB/s cloning from github?16:24
thorst_it has to be the path between github versus gerrit16:29
esbergluWhat were you seeing 32 KB/s on?16:30
thorst_from openstack gerrit16:30
thorst_and pokgsa16:31
adreznecJust curious, have we looked at what the routes look like for github/git.o.o16:33
adreznecSee if there are way more hops to reach one or the other, etc?16:34
adreznecthe pokgsa one is just weird though...16:34
adreznecThat should be the fastest of all16:34
thorst_I was doing that16:38
thorst_I think it dies on path 4.16:38
thorst_looking back at it17:48
thorst_esberglu: switched to a local DNS18:05
thorst_seems to be muh better18:05
thorst_give that a shot?18:05
thorst_might want to try that DNS with the other servers too...18:06
thorst_(POK went from 32 KB/s to 4 MB/s)18:09
openstackgerritTaylor Jakobson proposed openstack/nova-powervm: WIP: First pass at imagecache
esbergluthorst_: Hmm stuff is still hung up in the zuul queue. Gonna try restarting zuul18:43
thorst_yeah, I saw that...18:47
thorst_maybe ping kmtaylor and see if he knows anyone that has seen similar things18:48
esbergluthorst_: A nova-powervm run went straight through after restarting. Gonna sit back and see what happens18:51
thorst_yeah, but the issue seems to be that when nova things come in18:53
thorst_something gets wedged18:53
thorst_and then we're stuck behind that.18:53
*** edmondsw has joined #openstack-powervm19:17
esbergluthorst_: who is kmtaylor19:18
thorst_kurt taylor19:18
thorst_the PowerKVM lead19:18
thorst_he should be able to route us to the right PowerKVM CI owner19:18
thorst_I know that this is also hosted in the POK lab19:19
adreznecthorst_: is it still rfalco?19:24
esbergluHe's offline, anyone else that might know?19:25
esbergluAnd are we assuming nova is taking that much longer just because it's larger?19:25
thorst_not sure...19:27
adreznecesberglu: I think the maintainers are rfolco and mmedvede19:29
adreznecMikhail is online, you could try pinging him19:30
adreznecOr I can if you want19:31
tjakobsthorst_ how much time were you thinking for a max when randomizing the sleep for the retries? Also, did you mean to change the topic in gerrit?19:33
*** tlian has quit IRC19:41
esbergluthorst_: KVM basically said the same thing we were thinking. They switched to github and are considering setting up a local mirror19:42
adreznecThey're also in POK19:44
thorst_esberglu: if they set up a clone, we might as well just make one single clone for us both to use?20:09
thorst_maybe up on jupiter or something.20:09
thorst_esberglu_: why doesn't ours say "non-voting" on this?
esberglu_I think it's because we set up our voting through the zuul pipeline and not through the job definition21:12
openstackgerritTaylor Jakobson proposed openstack/nova-powervm: Retry up to 3 times on disk create
esberglu_I don't think it's a big deal. Other non-voting CI systems also don't show up as "non-voting"21:12
esberglu_(I know I gave you opposite info on this before when you were asking about whether we were voting)21:13
esberglu_Like Intel PCI CI on that patch you linked. They don't show up as "non-voting" but they don't vote21:13
*** apearson has quit IRC22:16
*** tlian has quit IRC22:23
