Thursday, 2016-11-10

*** chas_ has quit IRC00:00
*** chas_ has joined #openstack-powervm00:02
*** thorst_ has joined #openstack-powervm00:06
*** chas_ has quit IRC00:07
*** esberglu has joined #openstack-powervm00:09
*** thorst_ has quit IRC00:16
*** thorst_ has joined #openstack-powervm00:17
*** esberglu has quit IRC00:19
*** thorst_ has quit IRC00:25
*** chas_ has joined #openstack-powervm00:36
*** chas_ has quit IRC00:40
*** jwcroppe has joined #openstack-powervm00:42
*** seroyer has joined #openstack-powervm00:58
*** smatzek has joined #openstack-powervm01:04
*** thorst_ has joined #openstack-powervm01:23
*** thorst_ has quit IRC01:26
*** thorst_ has joined #openstack-powervm01:27
*** chas_ has joined #openstack-powervm01:36
*** chas_ has quit IRC01:41
*** tblakes has quit IRC01:49
*** smatzek has quit IRC01:54
*** seroyer has quit IRC02:17
*** jwcroppe has quit IRC02:55
*** jwcroppe has joined #openstack-powervm02:55
openstackgerritDrew Thorstensen (thorst) proposed openstack/ceilometer-powervm: Update the init of the collector to take in a conf
*** thorst_ has quit IRC04:08
*** thorst_ has joined #openstack-powervm04:08
*** thorst_ has quit IRC04:17
*** tjakobs has joined #openstack-powervm04:29
*** tjakobs has quit IRC04:33
*** chas_ has joined #openstack-powervm04:39
*** chas_ has quit IRC04:43
*** thorst_ has joined #openstack-powervm05:15
*** adi___ has quit IRC05:20
*** adi___ has joined #openstack-powervm05:21
*** thorst_ has quit IRC05:22
*** k0da has joined #openstack-powervm05:40
*** thorst_ has joined #openstack-powervm06:21
*** thorst_ has quit IRC06:27
*** chas_ has joined #openstack-powervm06:40
*** chas_ has quit IRC06:45
*** esberglu has joined #openstack-powervm07:27
*** esberglu has quit IRC07:31
*** openstackgerrit has quit IRC07:48
*** openstackgerrit has joined #openstack-powervm07:48
*** madhaviy has joined #openstack-powervm08:10
*** esberglu has joined #openstack-powervm08:26
*** thorst_ has joined #openstack-powervm08:28
*** esberglu has quit IRC08:31
*** thorst_ has quit IRC08:37
*** chas_ has joined #openstack-powervm08:42
*** chas_ has quit IRC08:47
*** chas_ has joined #openstack-powervm08:53
*** esberglu has joined #openstack-powervm09:21
*** esberglu has quit IRC09:26
*** thorst_ has joined #openstack-powervm09:35
*** thorst_ has quit IRC09:42
*** k0da has quit IRC10:10
*** madhaviy has quit IRC10:37
*** madhaviy has joined #openstack-powervm10:38
*** thorst_ has joined #openstack-powervm10:40
*** thorst_ has quit IRC10:47
*** esberglu has joined #openstack-powervm11:06
*** smatzek has joined #openstack-powervm11:07
*** esberglu has quit IRC11:10
*** edmondsw has joined #openstack-powervm11:26
*** madhaviy has quit IRC11:33
*** edmondsw has quit IRC11:34
*** edmondsw has joined #openstack-powervm11:35
*** thorst_ has joined #openstack-powervm11:45
*** thorst_ has quit IRC11:53
*** esberglu has joined #openstack-powervm12:21
*** esberglu has quit IRC12:25
*** thorst_ has joined #openstack-powervm12:47
*** thorst_ has quit IRC12:47
*** thorst_ has joined #openstack-powervm12:47
*** kylek3h has quit IRC12:50
*** esberglu has joined #openstack-powervm13:00
thorst_adreznec: I added you to 395560.  The z team added me due to discussions at the summit, but I think you did all the infra setup and probably can do a better technical evaluation of that than I13:04
*** svenkat has joined #openstack-powervm13:18
*** svenkat_ has joined #openstack-powervm13:21
*** svenkat has quit IRC13:25
*** svenkat_ is now known as svenkat13:25
adreznecthorst_: Ok, I'll take a look later13:29
*** kylek3h has joined #openstack-powervm13:29
adreznec#startmeeting PowerVM CI Meeting13:30
openstackMeeting started Thu Nov 10 13:30:34 2016 UTC and is due to finish in 60 minutes.  The chair is adreznec. Information about MeetBot at
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.13:30
openstackThe meeting name has been set to 'powervm_ci_meeting'13:30
*** wangqwsh has joined #openstack-powervm13:30
adreznecAll right. Roll call?13:30
adreznecAll right, looks like we have enough to get started13:31
adreznec#topic Current Status13:31
adreznecthorst_ or esberglu, want to kick things off here?13:32
thorst_so I think my (and really efried's) contribution is that the fifo_pipo is almost done13:32
thorst_I had a working version yesterday (no UT yet), then efried did a rev13:33
thorst_I'm not sure if we've validated that rev yet.  efried?13:33
efriedNo chance it's working now ;-)13:33
thorst_I have full faith!13:33
*** dwayne_ has quit IRC13:33
thorst_but want proof as well13:33
efriedCan you try it out with whatever setup you used yesterday thorst_?13:33
thorst_efried: yep - can do13:34
thorst_then we'll ask esberglu to take it in the CI while we work UT and what not13:34
esbergluI loaded the devstack CI with the version Drew put up yesterday13:34
thorst_well it is awesome.13:34
adreznecNot quite that awesome, but still good to get it tested13:35
thorst_how'd that go?13:35
esbergluThe tempest runs are still just looping looking for a matching LU with checksum xxx13:35
efriedRight, I was going to look at that this morning.13:35
esbergluSo its not even getting to the tempest stuff at this point13:36
efrieduh, is that the right log?13:37
efriedoh, is it the that's doing that?13:37
thorst_efried: right...we pre-seed the image in the SSP13:37
adreznecsuper early13:37
efriedoh, we've seen this before.13:38
thorst_#action thorst to validate efried's rev of remote patch13:38
efried#action efried to diag/debug "looking for LU with checksum" loop13:38
adreznecWe have? Hmm, must be forgetting13:39
esberglu#action esberglu to put remote patch in CI when ready13:39
adreznecWhat other status do we have? wangqwsh I saw an email from you with boot speed issues?13:39
esbergluYay I have an action this time13:39
wangqwshnot sure the reason13:40
adreznecHow slow are we talking here?13:40
thorst_wangqwsh: how slow of a boot are we talking/13:40
adreznececho echo echo13:40
wangqwshmore than 3 hours13:40
wangqwshthey are still in deleting13:41
adreznecWow, that's a lot worse than I expected13:41
esbergluAhh I bet they hit the marker LU thing13:41
thorst_I thought we'd be talking a couple mins here.13:41
adreznecOk, and this is on neo14?13:41
thorst_but its deleting?13:41
wangqwshye, neo1413:41
wangqwshi am trying to deleting them.13:41
wangqwshbecause of spawning error.13:42
efriedOkay, the thing esberglu is seeing is because there's a stale marker LU somewhere that's blocking the "seed LPAR" creation.13:42
esbergluYeah. I will go in and delete that marker lu now13:43
efriedAnd by "stale" I mean "who knows?"13:43
adreznecwangqwsh: I take it they never finished booting?13:43
adreznecYeah it sounds like they got blocked13:43
efriedSo people.13:44
efriedIs it possible for the compute process to die completely without cleaning up?13:44
efriedPossible of course theoretically - but in the CI environment13:44
efriedCause Occam's razor says that's the most likely way we end up in this stale-marker-LU state.13:45
thorst_efried: sure - we hit the timeout'll leave it around13:45
thorst_but we have a cleanup process13:45
thorst_now whether or not that cleanup process is cleaning out the markers...13:46
efriedRight - the 'finally' block will still get executed.13:46
efriedYes, 'finally' cleans up the marker LU.13:46
thorst_no no...13:46
thorst_I mean, we can just straight kill from zuul the run13:46
thorst_shut down the VM13:46
thorst_mid process13:46
thorst_could I guess leave the marker around...13:47
efriedoh, for sure.13:47
thorst_I'd think that'd be rare13:47
adreznecAn edge case, but definitely possible13:47
efriedWell, I can assure you our cleanup process doesn't delete marker LUs.13:47
efriedAt that level.13:47
efriedCause how would we know which ones were safe to delete?13:48
thorst_efried: agree.13:48
efriedBut it seems we're seeing this very frequently.13:49
thorst_though I think that the CI could have a job that says 'hey...if we have a marker lu that's been around for more than 60 minutes...its time to delete it'13:49
thorst_that would be 100% specific to the CI though13:49
thorst_because we know that nothing in there should take more than an hour.13:49
efriedBut would that mask if we had a real bug that causes a real upload hang?13:49
thorst_but I'd rather try to find the root cause (I don't think we have yet) before we do that13:50
thorst_efried: yep...exactly13:50
efriedno, we definitely have not identified the root cause.13:50
thorst_I wonder if we should add to the CI job something that prints out all the LUs before it runs13:50
thorst_before the devstack even...13:50
esbergluI can add that in it would be super easy13:51
adreznecYou're thinking for debug?13:51
thorst_adreznec: yah13:51
adreznecSeems like a fair place to start13:51
thorst_just to know, was it there before we even started13:51
efriedAnd at the end of a run, grep the log for certain messages pertaining to marker LUs.13:51
adreznecIs there anything else we'd want to add to that debug info13:51
thorst_and *maybe* to the pypowervm bit13:51
thorst_as part of a warning log, when we detect a marker lu...but I thought we had that.13:51
thorst_down in pypowervm itself...13:52
efried2016-11-10 02:53:12.099 INFO pypowervm.tasks.cluster_ssp [req-dcce7f01-a1fd-470b-a174-9fc51f9a4a05 admin admin] Waiting for in-progress upload(s) to complete.  Marker LU(s): ['partf59362e8image_base_os_2f282b84e7e608d5852449ed940bfc51']13:52
efriedI'm not seeing pypowervm DEBUG turned on in this log, btw, esberglu.13:52
efriedI thought we made that change.13:52
thorst_efried: we had it off because of all the gorp.  Now that gorp is gone...did we make that change in CI yet?13:53
adreznecI thought so...13:53
esbergluWe did. I wonder if I didn't pull down the newest neo-os-ci last redeploy13:53
efriedMerged on 11/313:54
thorst_#action esberglu to figure out where the pypowervm debug logs are13:54
esbergluI definitely did not pull down the newest...13:54
adreznecThat sounds bad13:54
esbergluIf I go into jenkins and manually kill jobs would that leave the marker LUs around?13:55
adreznecDoes zuul delete the VM if the job ends?13:55
adreznecHave to step away for a couple minutes, please continue without me13:56
efriedesberglu, I think the answer is yes13:56
thorst_esberglu: well, if its stuck in an upload13:56
thorst_efried: I wonder if we could/should get the VM's name in the marker lu13:57
efriedAnything that aborts the compute process.13:57
esbergluAhh then I'm probably to blame. I do that sometimes when I need to redeploy but don't want to wait for all the runs to finish13:57
thorst_esberglu: when we redeploy though, I thought we cleaned out the SSP?13:57
esbergluOr just redeploy the management playbook which also just kills stuff13:57
esbergluYeah redeploys do13:57
esbergluclean out the ssp13:57
efriedAnd since upload takes "a while", you can easily catch a thread that's in the middle of one.13:57
thorst_maybe we need to add something to the management playbook to clean out the SSPs.13:58
thorst_but efried, thoughts on adding a VM name?13:58
efriedRight - I had considered it, but it's kinda tough with name length restrictions.13:58
efriedI can make it work if we think it's critical.13:59
thorst_not sure...lets see how these other actions pan out?13:59
efriedThough I was going to make it some part of the MTMS13:59
*** apearson has joined #openstack-powervm13:59
efriedThe VM name doesn't really help us much.13:59
thorst_host + lpar id...13:59
thorst_that'd be a start13:59
thorst_changing it though has implications for in the field...though I think this is low use code at the moment13:59
efriedBut yeah, if esberglu is interrupting runs, that's going to be the culprit most of the time.14:00
thorst_esberglu: can we add something in the management playbook that cleans out the compute node markers (or ssps)?14:00
thorst_I mean, honestly, we want to clear out all of the computes14:01
thorst_but just not redeploy.14:01
esbergluSo the computes get cleaned out when the compute playbook is run14:01
esbergluBut not in the management playbook14:01
efriedThis is something that covers all the nodes sharing a given SSP?14:01
thorst_efried: it'd be all nodes in the given cloud14:01
efriedAre we close to done?  I've got another meeting.14:02
thorst_esberglu: yeah, its weird...because its the management playbook.  But I think you're running that just to build new images.14:02
thorst_so, I think you want to pull the clean of the compute nodes into that...or make a new playbook altogether for it14:02
esbergluYeah exactly14:02
thorst_efried: yeah, me too14:02
esberglu#action esberglu find a way to clean compute nodes from management playbook14:03
esbergluThe only other thing I had was for wangqwsh14:04
esbergluThe read only filesystem was fixed14:04
thorst_is that an assertion or a question?14:04
wangqwshesberglu: cool14:05
esbergluAnd I redeployed the staging env. with the newest versions of both OSA CI patches and the latest local2remote. I'm still getting stuck trying to build the wheels14:05
esbergluIt looks like you got past all of the bootstrap stuff and to the tempest part?14:05
wangqwshyes, at the tempest14:06
wangqwshyou can use this review:14:06
wangqwshlet me find it14:07
esbergluThat review is already deployed in the environment14:08
wangqwshi add some variables and scripts for osa14:08
esbergluYeah I have the latest version of that deployed14:09
esbergluI will send an email with more info about what I am hitting, too much info for irc14:10
adreznecSounds like we're done then14:10
adreznecThanks all14:10
wangqwshi can take a look at it14:10
thorst_good job team14:10
openstackMeeting ended Thu Nov 10 14:10:36 2016 UTC.  Information about MeetBot at . (v 0.1.4)14:10
openstackMinutes (text):
adreznecP.S. only the host can do that for the first 60 minutes esberglu14:10
esbergluHaha I though everyone else was gone14:11
*** tblakes has joined #openstack-powervm14:11
adreznecOh man14:12
adreznecocata-1 is next week14:12
thorst_adreznec: see os-vif is actually going in14:12
thorst_and being used14:12
adreznecYeah... and I don't think we've really done much testing with it. Not that it shouldn't just work14:13
*** dwayne_ has joined #openstack-powervm14:17
*** mdrabe has joined #openstack-powervm14:21
*** smatzek has quit IRC14:23
thorst_efried: this on your radar?
openstackLaunchpad bug 1637243 in pypowervm "Failure in TaskFlow returns large WrappedFailure exception to the user" [Undecided,New] - Assigned to Eric Fried (efried)14:24
thorst_I'm getting pinged on it.14:24
*** esberglu has quit IRC14:25
*** esberglu has joined #openstack-powervm14:38
*** seroyer has joined #openstack-powervm14:44
efriedthorst_, yeah, I was working on it yesterday in between the upload business.  I should have a change set up for review today, hopefully.15:04
*** seroyer has quit IRC15:09
*** smatzek has joined #openstack-powervm15:11
esbergluthorst_: efried: adreznec: I put up change 4486 for the management playbook to also clean out the vms15:13
*** wangqwsh has quit IRC15:20
*** tjakobs has joined #openstack-powervm15:25
thorst_efried: your patch seems to be working...I just need to verify the clean up bits.  Otherwise though, looking solid15:28
efriedI put up a minor edit for the vopt thing.15:28
efriedYou wouldn't have hit the bug in the remote env, but would've in local.15:28
thorst_this actually is uses next to no CPU15:30
thorst_I really like it15:30
*** adreznec has quit IRC15:34
*** adreznec has joined #openstack-powervm15:37
*** kriskend has joined #openstack-powervm15:38
*** seroyer has joined #openstack-powervm16:45
*** seroyer has quit IRC16:48
*** mdrabe has quit IRC17:00
thorst_efried: change looks solid17:09
thorst_esberglu: you should probably integrate that change while we then fix up UT17:10
esbergluWill do17:10
*** mdrabe has joined #openstack-powervm17:10
*** dwayne_ has quit IRC17:34
*** dwayne_ has joined #openstack-powervm17:36
thorst_efried: small q on 448717:46
thorst_efried: also this one
*** k0da has joined #openstack-powervm18:17
efriedthorst_, 4487 - did you want me to replace the tabs with spaces, or remove them entirely, or...?19:08
thorst_well, it just meant the first line of the new exception would have a tab19:12
thorst_and the rest of the lines wouldn't19:12
thorst_and that seemed...weird to me.19:12
efriedthorst_, I don't think so.19:24
efriedself.msg_fmt = _("FeedTask %(ft_name)s experienced multiple "19:24
efried                         "exceptions:\n\t%(concat_msgs)s")19:24
efried        concat_msgs = '\n\t'.join([fail.exception_str19:24
efriedSo like19:25
efriedFeedTask foo experienced multiple exceptions\n\tFirst message\n\tSecond message\n\tThird message.19:25
efriedHere's what the formatted string in the test case looks like:19:27
efriedFeedTask ft experienced multiple exceptions:19:27
efried        this is an exception on lpar1!19:27
efried        this is an exception on lpar2!19:27
thorst_what happens to the exceptions stack?  Just hidden?19:41
efriedthorst_, I'm logging the wrapped exceptions via LOG.exception before I raise the MultipleExceptionsInFeedTask one (which is the one that has the concatenated message).19:45
thorst_got it19:45
thorst_OK - excellent19:45
efriedAnd the MultipleExceptionsInFeedTask exception itself also has a wrapped_failure attribute19:45
efriedwhich contains the original WrappedFailure object, which can be dissected to get right at the stack traces, messages, exception objects, etc.19:45
efried...if the caller wanted to trap it via try/except.19:46
efriedthorst_ Let's talk about 4475 too if you have a sec.19:47
thorst_let me look it up19:48
*** chmod666org has joined #openstack-powervm19:48
*** openstackgerrit has quit IRC19:48
thorst_fire it up19:48
thorst_I do not have a suggestion.  In my head I was thinking a string that resolved to a class name and have the adapter load that19:48
thorst_but that's gross.19:48
*** openstackgerrit has joined #openstack-powervm19:49
efriedthorst_, that IS gross.19:55
efriedBut how do you feel about me just parsing the XML "by hand" in the vios_busy_retry_helper?19:55
efriedIt's not too many LOC, fairly straightforward.19:55
efriedpity not to use the wrapper class, but if you think the benefits outweigh...19:56
thorst_efried: to me...I think I'm OK with that...19:56
thorst_I hate having imports in weird places19:56
efriedI just don't want you to think that *any* helper will have this kind of import problem.19:56
efriedAlthough tbh, they kinda will.19:56
efriedCause what other kind of thing would we be looking for in a Response?19:57
thorst_not sure.19:57
efried...other than something we have a wrapper for19:57
efriedSo anyway, those are the options.19:57
efriedI can't think of a reasonable way to get the import jankitude into the Adapter.19:57
thorst_I can just shut up about the imports19:57
thorst_I'm being kinda a princess on it19:57
efriedIt'll actually compile okay with an unknown symbol in the kwarg value - but as soon as you run it, you better have that symbol resolved.19:57
thorst_and if the other way isn't reason for you to do it for me to just whine about that.19:58
efriedIt's only better because it avoids the deferred imports.19:58
thorst_I can just resolve that this is the best solution...19:58
thorst_maybe a comment as to why we did this awful thing19:58
thorst_and be done at that19:58
thorst_(if not already there)19:58
efriedI won't advocate or veto either one.  I'd almost prefer the direct-parsing solution.  Your call.19:58
efriedbtw, this isn't a problem for consumers of the adapter/helpers, since they'll be importing both - no circularity.  As proven by community code.19:59
thorst_I defer to you.20:00
efriedthorst_, what about 4487?  We good there at this point?20:07
efriedPM took a rather strict interpretation of our conversation yesterday and put a due date of 11am today on it.20:08
thorst_need a bit...chatting with someone on sdn20:08
*** chas_ has quit IRC20:18
*** kylek3h has quit IRC20:28
*** k0da has quit IRC20:33
*** k0da has joined #openstack-powervm20:34
*** thorst_ has quit IRC20:51
*** thorst_ has joined #openstack-powervm20:51
*** thorst_ has quit IRC20:56
*** seroyer has joined #openstack-powervm21:02
*** seroyer has quit IRC21:09
*** chas_ has joined #openstack-powervm21:19
*** smatzek has quit IRC21:21
*** chas_ has quit IRC21:23
*** kylek3h has joined #openstack-powervm21:29
kriskendjust figured out how to get the Guru Meditation log for nova compute.  It is very helpful for debugging hang conditions, because it shows you the stack trace for nova threads21:30
kriskendsudo kill -USR2 3293421:30
kriskendfor the nova process21:30
kriskendin devstack, the output should go to nova compute log21:31
*** thorst_ has joined #openstack-powervm21:46
esbergluefried: thorst: At some point while the prod. CI was down, master devstack broke for us.21:56
thorst_undercloud or AIO?21:57
esbergluAIO. We use stable/newton for undercloud now21:57
*** smatzek has joined #openstack-powervm22:01
thorst_yeah...we'll need to debug.  I think efried just fought through a bunch of this...22:02
*** svenkat has quit IRC22:02
*** smatzek has quit IRC22:02
thorst_but he may have ultimately rolled back to stable/newton22:02
efriedYeah, I gave up on master.22:05
*** apearson has quit IRC22:06
esbergluCrap. There are some stable/newton runs going so we can see what happens there22:09
esbergluNot good22:11
*** smatzek has joined #openstack-powervm22:14
efriedesberglu: Have we had a successful CI run with the latest 4458?22:14
efried2016-11-10 16:06:41.682 ERROR oslo.messaging._drivers.impl_rabbit [-] [5b362dca-295c-41c1-9c66-65308f610e9b] AMQP server on is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: 3808022:15
efried2016-11-10 16:06:41.735 ERROR oslo.messaging._drivers.impl_rabbit [-] [d9b8fdfd-7084-403e-9ef9-9a6f0cb5601f] AMQP server closed the connection. Check login credentials: Socket closed22:15
efried2016-11-10 16:06:41.775 ERROR oslo_messaging.rpc.server [-] Can not acknowledge message. Skip processing22:15
efriedThis breaks the pipe and kills the fifo_reader.22:15
esbergluNope. Whatever broke master devstack came in while I was redeploying with the latest. That was the first run to go through with it in22:16
efriedo.  So whatever thorst_ said was looking good earlier... wasn't CI.22:17
efriedthorst_, esberglu: Looks like we need to ignore-and-retry on EINTR.  Which is... weird.22:21
efriedWill happen automagically in py322:21
efriedye gods, this would get ugly fast.22:25
efriedIf we have to trap and ignore EINTR on every bloody open() and read() and write() and close()...22:25
*** seroyer has joined #openstack-powervm22:29
efriedthorst_, esberglu: Uploaded a new patch set to 4458 which should work around this EINTR thing - which oughtta be intermittent anyway.  I don't like it.22:31
*** seroyer has quit IRC22:32
esbergluAlright. I have to go take care of some lease stuff. And then gerrit is down so I will do that 1st thing tomorrow22:35
*** esberglu has quit IRC22:36
*** esberglu has joined #openstack-powervm22:37
thorst_efried esberglu: OK22:41
*** esberglu has quit IRC22:41
*** kylek3h has quit IRC22:46
*** esberglu has joined #openstack-powervm22:50
*** esberglu has quit IRC22:54
*** chas_ has joined #openstack-powervm22:58
*** chas_ has quit IRC23:02
*** chas_ has joined #openstack-powervm23:04
*** smatzek has quit IRC23:07
*** mdrabe has quit IRC23:08
*** chas_ has quit IRC23:09
*** kriskend has quit IRC23:10
*** tblakes has quit IRC23:11
*** kylek3h has joined #openstack-powervm23:15
*** k0da has quit IRC23:17
*** tjakobs has quit IRC23:19
*** esberglu has joined #openstack-powervm23:29
*** esberglu has quit IRC23:29
*** esberglu has joined #openstack-powervm23:30
*** esberglu has quit IRC23:34
*** seroyer has joined #openstack-powervm23:43
*** seroyer has quit IRC23:45

Generated by 2.14.0 by Marius Gedminas - find it at!