Tuesday, 2017-03-28

*** smatzek has joined #openstack-powervm00:59
thorstefried adreznec: know of any tools that test the underlying cloud performance.  Not things like how well do the APIs scale (that's Rally), but how well does I/O run across X VMs with Y hosts12:41
esberglu#startmeeting powervm_driver_meeting13:00
openstackMeeting started Tue Mar 28 13:00:05 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.13:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.13:00
openstackThe meeting name has been set to 'powervm_driver_meeting'13:00
esberglu#topic In Tree Driver13:02
esbergluefried: I was looking through the changesets and CI really didn't like power on/off and after13:02
esbergluOh wait nvm13:03
efriedYeah, I don't think that was us.13:03
efriedSomething fundamental done broke.13:03
esbergluYeah I think it was an issue with certain CIs13:03
esbergluBut looks like verified +1s now13:03
esbergluSo I owe you a couple reviews13:04
efriedI don't know if you caught this comment last week, but...13:04
efriedIf you ever feel down about our CI success rate, just look at freakin xenserver.13:05
efriedThat guy fails like 50% of the time.13:05
efriedAnd that one is gating.13:05
efriedDoes anyone have an OOT setup with recent (last 2 weeks, say) nova code underneath it where we can verify that glance bug?13:07
esbergluYeah I think we are fine from a success rate view. At least most days :-)13:07
efriedI'm still restacking my test system to debug it.  But I need to prove that it affects OOT so I can open a launchpad bug.13:07
thorstefried: I don't13:07
adreznecefried: Nope, still on Ocata over here13:09
jayasankar_efried: I'm reconfiguring neo34 for OOT, got stuck with issues, which I'm looking into ..13:09
efriedOtherwise in-tree just need reviews, at least up to 'console'.  (I don't want to move SSP into the ready list until we figure this bug out.)13:10
efriedthorst I may need your help with the bug13:10
efried"Monkey patch the glance API code in nova" is my only solution right now.13:11
thorstuhhh, that's awful13:11
thorstif you have a setup where it's borked I can take a peak13:11
efriedYeah, I assume that's not a viable solution.13:11
thorsttotes not viable.13:11
efriedthorst I ought to have that by the time this meeting is over.  Stacking now.  And that always succeeds.13:12
adreznecShould be ready in 10 minutes then efried13:12
efriedbtw, wanna queue up a topic for after the meeting: I have a sneaking suspicion that, when a system has been running for a long time, things go pear-shaped.13:12
adreznecThat sounds bad, but ok13:13
thorstyeah, curious about that too...because we've been running CI for months13:13
thorstbut...post scrum topic13:13
esberglu#topic OOT Driver13:14
esbergluAnyone have anything here?13:14
efriedI've been accumulating changes from in-tree to backport to OOT.13:15
efriedI have some of them in a (no-yet-proposed) commit.13:15
efriedBut some things have come up that will require a much wider effort.13:15
efriedLike autospeccing.13:15
thorstI know Shyama will be proposing fixes for LPM w.r.t. Cinder and File backed volumes.13:15
thorstshe's taking over a change set from me13:15
efriedI guess I don't really have an action item to propose here, but I do want to announce that I'll be requiring new UT to autospec anything coming from pypowervm from this point forward.13:16
thorstfair enough...13:16
efriedAnd it won't hurt my feelings if people want to go retrofit existing UTs with autospec.13:16
adreznecThe ephemeral file support is still on hold until we can get those pesky REST changes implemented. Probably a couple sprints out still tbh13:16
thorstadreznec: and then we need pypowervm updates?13:17
adreznecOnce the REST side is done13:17
thorstgood thing we have a new versioning approach there13:17
adreznecYeah we'll have to keep that as a topic13:17
adreznecDeciding when we need to do a version bump there13:18
adreznecFYI it looks like the change to add a global-reqs job for nova-powervm got stuck (https://review.openstack.org/#/c/440852/)13:20
adreznecThe corresponding deps merged but it didn't go in. Just bumped it13:21
adreznecDo we want to add g-r jobs for networking-powervm and ceilometer-powervm?13:21
thorstwe should I'd think13:21
adreznecI can toss those up a bit later here13:22
adreznecFairly straightforward13:22
esbergluAnything else OOT before we move on?13:22
esberglu#topic CI13:24
esbergluI've got a bunch of stuff here13:24
esbergluI believe we are ready to move up the IT CI patches to console?13:25
esbergluAnd then add the corresponding whitelist change13:25
adreznecSounds like it13:26
esbergluThen we can start getting some volume through and hunt down any issues13:26
esbergluSo I will put up that patch today13:26
esbergluOther than that there are a few things I want to get working13:27
esbergluI want to get all branches running on master tempest13:27
esbergluocata and master are fine13:27
esberglunewton is passing everything but 3 tests13:27
esbergluSo I need to figure those failures out and then we can move it up for newton13:28
esbergluI also want to get the undercloud moved from newton to ocata13:28
esbergluIt seems like we have a lull where I can try to get that going on staging13:28
esbergluI'm guessing it's going to be a bigger endeavor than just checking out a different branch13:29
esbergluThen the last big change is to fix the goofy networking stuff13:29
esbergluRight now the IT and OOT networking is different13:30
thorstdid we ever dig up that OVS note?13:30
esbergluAnd OOT networks are being created in prep_devstack.sh while IT is using the os_ci_tempest.sh13:31
esbergluAnd its just bad13:31
*** smatzek has quit IRC13:31
esbergluthorst: Was gonna talk to you about that today if you have time13:31
thorstI'm free between 12-3 to chat about that13:32
thorstjust need to find that note...I have no idea where that thing is  :-)13:32
esbergluOkay I'll hunt it down after this13:32
thorstI seem to remember me thinking it was brilliant at the time, but I've since forgotten what that idea is13:32
esbergluThat's all I have for CI13:34
jayasankar_esberglu: We don't have any tests specific to SVC + FC in CI right ?13:35
thorstjayasankar_: we do not.13:37
thorstno cinder in the CI13:37
esbergluYep. That's why we are having you take a look13:37
adreznecjayasankar_: The only storage in the CI today is SSP13:38
efriedAnd using remote upload, at that.13:38
efriedwhich is why we didn't see problems three weeks ago.13:38
esberglu#topic Open Discussion13:39
esbergluefried: You had something here?13:39
efriedMy test system was up, not doing anything, for a couple of weeks.13:39
efriedWhen I got back to it, it was broken.13:40
efriedI've been looking at it while we've been talking, here, and I believe I've narrowed it down to the VIOS being hosed.13:40
efriedI know at least the cluster is screwed.13:40
efriedAt the moment I'm trying to figure out if it could be because another system was in the cluster, and it may have inadvertently used the cluster disks for something.13:41
adreznecNetworking issues maybe?13:41
thorstadreznec: networking never fails13:41
efriedMm, could be part of it, I suppose.  Got a weird error listing the cluster - it was saying the localhost was only reachable through the repository disk.13:41
efriedAnyway, purely anecdotally, this isn't the first time I've exerienced this - left a neo alone for "a while" and come back to find it borked.13:42
adreznecWe've had systems up and running for many weeks without notable issues13:42
thorstefried: could be shared disk issues.13:42
efriedOkay, we have?  Then I'm happy.13:42
efriedI need to be reminded where that SAN is so I can make sure those disks are gone from the other neo.13:42
efriedAnd I'll contact Uma to see if she can recover it to some normal state.  I can't get anything going wrt the cluster right now.13:43
nbanteesberglu: I need help on to configure tempest in OSA. I stuck there last few weeks.13:43
esberglunbante: I'm in the same boat. I just got an OSA deployment to complete the full run_playbooks script yesterday for the first time since picking OSA back up13:45
thorstefried: I can send you the v7k13:45
nbanteI faced so many issue while setup but now stuck in tempest13:46
adreznecnbante: esberglu are these AIO?13:46
esbergluYeah mine is13:46
adreznecIf so, one the AIO is running you should just be able to use the gate-check-commit.sh script in the OSA repo I think13:47
adreznecA subset of which is running tempest agains tthe AIO13:47
adreznecAll in One13:47
esbergluAll in one13:47
adreznecThat'll do a bit more than just tempest, but it'll be the same level of testing they'd do in the gate13:48
adreznecWhich is what we'd ideally want13:48
nbanteadreznec: do you have any link where I can get that script. I'll try to run tht as well.13:50
*** k0da has joined #openstack-powervm13:50
adreznecnbante: It's in the scripts subdirectory of the main OSA repo13:50
adreznecSo if you have OSA cloned down, you should already have it13:50
adreznecin openstack-ansible/scripts/13:50
nbanteI already cloned down. It should. will try to run and share you result13:51
*** smatzek has joined #openstack-powervm13:54
esbergluAny final topics before I end the meeting?13:56
jayasankar_is there any planned schedule for IT deliverable ?13:57
jayasankar_both IT and OOT ? or it is like by 2Q we have to complete both ?13:58
thorstjayasankar_: the OOT is there today.  IT needs to be done as patches are proposed up14:01
thorstthe core reviewers hold the key to when things get merged in...14:01
thorst(we are not core reviewers)14:01
thorstso the net is, IT needs to be tested as efried proposes them up  :-)14:01
*** edmondsw has quit IRC14:02
esbergluThanks for joining14:03
openstackMeeting ended Tue Mar 28 14:03:21 2017 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)14:03
openstackMinutes:        http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-03-28-13.00.html14:03
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-03-28-13.00.txt14:03
openstackLog:            http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-03-28-13.00.log.html14:03
efriedOkay, one more time - how do I see the size of a PV in ioscli?14:04
*** edmondsw has joined #openstack-powervm14:05
*** nbante has quit IRC14:07
efriedgot it.  geez14:07
*** edmondsw has quit IRC14:38
*** edmondsw has joined #openstack-powervm14:41
*** kylek3h_ has joined #openstack-powervm15:08
*** kylek3h has quit IRC15:10
*** jayasankar_ has quit IRC16:41
*** nbante has joined #openstack-powervm16:58
*** shyama has joined #openstack-powervm17:20
*** nbante has quit IRC18:00
*** shyama has quit IRC18:25
thorstefried: I'm going to +2 this.  Any reason to hold off on a W+1 ?  https://review.openstack.org/#/c/448381/218:39
efriedthorst fine by me18:48
*** jpasqualetto has quit IRC19:15
*** jpasqualetto has joined #openstack-powervm19:28
*** mdrabe has quit IRC19:35
efriedthorst adreznec Well, in my latest stack, the upload is hanging the entire compute process.19:56
efriedIt's hanging on open() of the pipe.20:17
efriedwtf could cause that??20:17
*** smatzek has quit IRC20:20
thorstefried: we had that for a long while...I'm thinking....20:22
efriedopen() shouldn't be able to hang.  There's no such thing as a non-advisory file lock in Linux, is there?20:23
thorstI thought when we had the 'open' hang it was actually hanging on the upload pipe20:24
thorstand it was a super esoteric code path to figure that out20:24
efriedWell, I've put debug printfs in the glance code itself, and I get my debug statement right before open() and not after.20:25
efriedSo it's not trying to write anything yet.20:25
efriedAnd yes, it's trying to open() the fifo itself.20:25
efriedbtw, I can reproduce the hang by trying to echo > the fifo from the shell.  So the hang is at the syscall level.20:28
adreznecefried: thorst So I'm not all that familiar with the new upload code, but are we setting os.O_NONBLOCK on the pipe when we open it?20:29
efried"we" aren't opening it.20:29
efriednova/image/glance.py is opening it.20:29
efriedAnd no, it's just saying 'wb'20:29
efriedActually, is O_NONBLOCK an available flag on open()?  I don't think it is.20:31
efriedIt's available on lock20:32
efriedAny case, we don't have control over that.20:34
efriedBut hey, it looks like it may actually be because the reader needs to open the sucker first.20:34
efriedSo - why isn't the REST API opening the pipe from their end?20:34
efriedThis is gonna be fun to figure out.20:34
efriedOkay, I manually called the dummy REST API upload function from an ipython session, and it kicked the compute thread in the pants, allowing it to complete (though it still errored).20:41
efriedCould I be single-threaded??20:41
efriedHow would I find that out?20:41
efriedIs there a reason we're threading this at all?20:45
efriedDoes the upload_file thread actually block until the pipe is fully written??20:46
*** esberglu has quit IRC20:47
*** esberglu has joined #openstack-powervm20:48
*** esberglu has quit IRC20:52
efriedrat farts.  it does.20:57
efriedSo this hangs the entire compute process.21:03
efriedLike, everything stops.21:04
efriedadreznec thorst How can I tell how many threads I've got?21:20
efriedneo@neo40:/opt/pvm-rest/data/fileupload$ lscpu21:21
efriedArchitecture:          ppc64le21:21
efriedByte Order:            Little Endian21:21
efriedCPU(s):                1621:21
efriedOn-line CPU(s) list:   0-1521:21
efriedThread(s) per core:    821:21
efriedCore(s) per socket:    121:21
efriedSocket(s):             221:21
efriedNUMA node(s):          121:21
efriedModel:                 2.1 (pvr 004b 0201)21:21
efriedModel name:            POWER8 (architected), altivec supported21:21
efriedHypervisor vendor:     pHyp21:21
efriedVirtualization type:   para21:21
efriedL1d cache:             64K21:21
efriedL1i cache:             32K21:21
efriedNUMA node0 CPU(s):     0-1521:21
efriedSo.  Why is the hanging open() causing the entire compute process to hang?  That is the question of the day.21:22
efriedBooya, at least I figured out the EINVAL.  fsync on a FIFO.21:24
*** edmondsw has quit IRC21:42
*** edmondsw has joined #openstack-powervm21:42
*** jpasqualetto has quit IRC21:47
*** edmondsw has quit IRC21:47
*** thorst has quit IRC21:51
*** jwcroppe has joined #openstack-powervm22:51
*** thorst has joined #openstack-powervm23:08
thorstefried: ppc64smt I think?  Or just cat /proc/cpuinfo23:19
