Thursday, 2016-10-13

*** tblakes has joined #openstack-powervm02:03
*** k0da has joined #openstack-powervm09:20
*** thorst has quit IRC09:20
*** apearson has joined #openstack-powervm10:06
*** antonym has joined #openstack-powervm10:07
*** tonyb has joined #openstack-powervm10:07
*** efried has joined #openstack-powervm10:09
*** AndyWojo has joined #openstack-powervm10:09
*** adreznec has joined #openstack-powervm10:10
*** antonym has joined #openstack-powervm10:26
*** tonyb has joined #openstack-powervm10:26
*** adreznec has joined #openstack-powervm10:27
*** apearson has joined #openstack-powervm10:27
*** efried has joined #openstack-powervm10:27
*** AndyWojo has joined #openstack-powervm10:27
*** adi___ has joined #openstack-powervm10:28
*** miltonm has joined #openstack-powervm10:28
*** dwayne has joined #openstack-powervm10:28
*** openstackgerrit has joined #openstack-powervm10:28
*** toan has joined #openstack-powervm10:28
*** AlexeyAbashkin has joined #openstack-powervm10:28
*** thorst has joined #openstack-powervm11:22
*** seroyer has joined #openstack-powervm11:31
*** thorst has joined #openstack-powervm11:52
efriedthorst, you cherry-pickin' this morning?12:49
thorstefried: I will12:57
efriedI'll be waiting with my axe, George Washington style.12:58
*** apearson has joined #openstack-powervm12:59
viclarsonit works14:04
viclarsonwe have some problem14:04
thorstheh - what's up14:04
viclarsonaix vm provisioned by openstack14:05
viclarsoncant see attached volumes14:05
viclarsonuntil reboot14:05
viclarsoncfgmgr doesn't help14:05
viclarsonuntil first reboot14:05
thorstviclarson: are you using vSCSI I assume?14:05
viclarsonafter first reboot14:06
thorstand the volumes were attached post spawn or on original boot?14:06
viclarsondoesn't matter14:06
thorstSSP for nova ephemeral tho?14:06
viclarsonpost spawn14:07
viclarsonafter start vm14:07
thorstcan you try on VM boot?14:07
viclarsondoesn't matter after or before cloudinit14:07
viclarsonon vm boot?14:08
viclarson1) after rmc connected14:08
viclarson2) after cloud-init finished14:08
viclarsonbefore rmc connected were troubles but workarounded14:09
thorstyeah...well both of those would need to be done.14:09
viclarsoni like my english14:09
thorstRMC should connect via the MGMTSWITCH, should be seamless.14:09
viclarsonif rmc dont connected volumes cant be attached14:09
viclarsonbut they are attached14:10
thorstright, I'm curious why RMC wasn't seamless for you...but that's a different discussion14:10
thorstI think we'd have to ask someone from the AIX team to look at that.14:10
thorstapearson: do you know anyone that could help?14:10
viclarsoni saw wrong ct_node_id can affect volume connection14:13
viclarsonbut it seems all attached correctly14:13
viclarsonand simple reboot repairs it14:13
viclarsonmb reinit of rmc d be done before network configuration?14:14
thorstviclarson: rmc is just a communication/mgmt shouldn't really affect boot process14:16
thorstI think we need an AIX guru here...just trying to locate one14:16
efriedthorst, did you find an AIX person yet?14:36
efriedStart with Veena or Uma.14:37
efriedviclarson, is the chdev command hanging?  If it's holding an ODM lock, it could certainly hold up cfgmgr.  But then cfgmgr oughtta hang.14:37
viclarsonefried, no14:40
viclarsonefried, no hangs14:40
efriedviclarson, okay.  So let me make sure I'm understanding: you've got a volume that's attached to your AIX VM, but it's not showing up inside the VM?14:41
thorstunless you reboot14:42
thorstindicating its wired correctly.14:42
efriedOh, it appears on reboot?  Yeah, that's odd.  Even without RMC, cfgmgr oughtta discover it.14:42
efriedthorst, viclarson: what kind of volume is it?14:44
thorstvscsi FC14:44
efriedviclarson, what's the oslevel of the AIX LPAR?14:47
thorstlooks like they don't use glance v1....which is good14:53
adreznecFor the other Xen CI14:53
adreznecFirst one was libvirt Xen, second is XenAPI Xen14:54
adreznecthorst: esberglu I actually just realized that we don't publish our conf files (nova.conf, neutron.conf, etc) for our CI runs14:56
adreznecWe should probably consider doing that as a backlog item14:56
thorstadreznec: yeah...maybe something qingwu could add?14:56
efriedthorst, viclarson: It sounds like sans RMC, the attach could succeed, but if the DLPAR stuff doesn't happen, the reboot would be required.14:59
efriedI didn't quite follow the discussion above - was RMC alive when the attach was performed?14:59
viclarsonbut rmc was reconfiged since vm boot15:00
efriedBefore the attach?15:01
viclarsonin one case before15:01
viclarsonin second after15:01
viclarsonthe same result15:01
viclarsonefried,  DLPAR stuff? what d happen?15:03
efriedI'm really not very well versed on that aspect.  apearson, yt?15:05
apearsonviclarson / efried - Why RMC isn't coming up is a mystery.  After you do the deploy, can you run 'ctsnap' on the AIX lpar and the same command on the NovaLink LPAR and drop the logs somewhere where I can take a look?15:06
efriedSounded to me like RMC was okay15:07
viclarsonrmc comed up15:07
*** mdrabe has quit IRC15:08
efriedapearson ^^  So if RMC was okay, DLPAR stuff should've happened, right?15:08
viclarsoni just see volume attached to lpar from vios side, but dont see from lpar15:09
viclarsonuntil reboot15:09
efriedIf RMC glitched in the middle, but the LPAR was still active, would we fail the attach, or would we proceed and just skip the DLPARy stuff?15:09
apearsonviclarson - if the volume attach was happening AFTER the VM was booted, a cfgmgr will be needed on the VM to re-discover it.15:09
efriedapearson - right - cfgmgr didn't discover it - only reboot did.15:10
efriedThat's the mystery.15:10
viclarsonyes, but cfg cant re-discover until reboot15:10
apearsonbut looks like you said cfgmgr didn't do absolutely positive?15:10
efriedSounds like this is reproducible too.  viclarson, happens every time, or intermittent?15:10
viclarsonevery time15:11
efriedviclarson, have you tried the same thing on other types of LPARs?15:11
viclarsonother types?15:12
viclarsonother oses?15:12
apearsonefried - checking with Pafumi to see if they've ever seen that sort of behavior before on the vscsi15:12
viclarsontried with ubuntu15:12
viclarsoni think it worked15:12
efriedapearson, cool.  I talked to Uma, and am chatting with Ruzek as well.15:13
efriedviclarson: If you can reproduce, then immediately run15:13
efriedalog -otcfg > /some/file15:13
efriedthen send us the file.15:13
apearsonviclarson - waht kind of disk is the volume?   And did you apply 'vios rules' when you installed the VIOS?15:13
efried(that command runs on the LPAR)15:13
apearson'rules -o apply' I believe...15:13
viclarsoni did't install vios15:14
viclarsonit came to me already installed15:14
apearsonAnd what kind of disk is the volume?15:15
viclarsonvscsi FC15:15
viclarsonis it answer?15:15
apearsonYup - ok, so the issue is likely with max transfer sizes conflicting on the disk that you add.    So that means as the VIOS discoveres a new disk it may get a different transfer size than the disk you're already have assigned to the lpar...and if they differ, it gets masked off.15:17
apearsonSo - apply the VIOS rules...this will make it so that all new disks will get the same transfer size, and this problem should go away.15:18
viclarson-f rulesFile15:21
viclarsonwhat rules file?15:21
openstackgerritEric Fried proposed openstack/nova-powervm: Change devstack pypowervm branch to develop
openstackgerritEric Fried proposed openstack/networking-powervm: Change devstack pypowervm branch to develop
openstackgerritEric Fried proposed openstack/ceilometer-powervm: Change devstack pypowervm branch to develop
efriedapearson - see viclarson question above15:31
thorstefried: lol...what a miss.15:33
apearsonviclarson - skip the -f...use the default one.15:39
apearsonso just 'rules -o deploy'15:39
viclarsonA manual post-operation is required for the changes to take effect, please reboot the system.15:40
viclarsonare you sure?15:40
openstackgerritMerged openstack/nova-powervm: Change devstack pypowervm branch to develop
openstackgerritMerged openstack/ceilometer-powervm: Change devstack pypowervm branch to develop
viclarsonmb this is logs15:42
viclarsonhow to safety reboot it?15:42
viclarsonis it possible?15:42
viclarsonman who can recover it on vacation now15:43
openstackgerritMerged openstack/networking-powervm: Change devstack pypowervm branch to develop
apearsonso is this a single VIOS?  If so, power off all your other lpars first...if it's dual, you should be able to just shutdown -r.  Make sure you have console access...15:46
apearsonviclarson ^^15:46
viclarsonapearson, two vioses15:54
apearsonok - in that case rebooting one of them should be fine as long as I/O is all redundant...15:55
viclarsonapearson, after reboot of vms all attached devices was detected. vios dont mask devices if vm rebooted?15:57
viclarsonis it that problem?15:57
thorstefried: mind re-reviewing 4046?16:21
thorstwe've got a team waiting on it16:21
efriedthorst, ack.16:30
efriedthorst, looks like you need to fix sonar, tho16:31
thorstsonar: "I don't like the color of this character"16:33
thorstme: what?16:33
efriedthorst, you can ignore the unused args - that rule is nonfatal.16:34
efriedBut you need to get rid of the unused class.16:34
thorstyeah, it found something this one time16:34
thorstnext rev is up16:35
*** esberglu has joined #openstack-powervm17:21
apearsonviclarson - on reboot, the client is able to re-negotiate the transfer size to match the lowest listed...17:25
openstackgerritDrew Thorstensen (thorst) proposed openstack/nova-powervm: Separate out glance read logic
thorstkriskend: ^^17:54
thorstI would ask you take a look at that...17:54
thorstI think it'll make debug of the streaming issue easier.17:55
kriskendgoing to load the patch now18:12
kriskendwaiting for Steve to finish setting up l2pop and then I will try again18:21
kriskendthorst: So much output...18:55
thorsto, you have debug on18:55
thorstaren't the chunks like 16 meg tho?18:55
kriskendwe restacked... shouldn't that have blown away my glance debug setting in nova.conf18:56
kriskendwatching the screen might give me a seizure18:57
thorstooo, I want to see18:57
thorstPM me the compute node18:57
thorstwell....  kriskend the chunks are significantly smaller than I expected...19:02
thorstI thought 16M, not 65K19:02
kriskendlots of chunks19:02
thorstfunky chunks19:03
kriskendit is spewing chunks19:03
kriskendso sad that irc does not have giphy19:05
thorstor a blessing19:05
kriskendyeah..probably best19:05
kriskendso started the spawns 15 minutes ago... none are to the point of booting yet19:07
thorstwell, that much debug is bound to slow it way down19:07
thorstbut it looked to me like there were 6 reads going on in the logs?19:07
thorstsome that started at 13:51, and another round of three at 14:01?19:08
kriskendError: Failed to perform requested operation on instance "KAx-4", the instance has an error status: Please try again later [Error: ('Connection broken: IncompleteRead(7395 bytes read)', IncompleteRead(7395 bytes read))].19:11
thorstyeah...that's new....19:12
thorstall at the same time19:12
kriskendthe 3rd batch of 3 all started and failed around the same time19:12
kriskendand the very last one never starts19:12
thorstyeah...but aren't you glad we have all those chunks in there?19:13
thorstnow we know where it failed19:13
kriskendthat is true19:13
kriskendi am just happy to be making any progress on this issue19:13
seroyerWas watching iostat at the time.  All three made it to 1600 bytes written.19:13
seroyerWhich is different from before, when there were no bytes written.19:14
seroyerthorst: The ones that failed?  It actually detected the failures this time and cleaned up the VMs.  The socat processes are gone, but the LVs are still there.  There is a socat for the last one, but no writes to it.19:18
kriskendso my vote is that this is better than before :-)19:22
seroyerI see that it did a couple of reads: Read chunk 1 from glance image.  Saw up to chunk 3.  Nothing after that.19:24
thorstI saw up to chunk 24 on one19:26
thorstseroyer: can we do another deploy?  do they get hung?19:27
kriskendwithout cleaning up the existing?19:28
thorstcause we want to see if it hangs...right?19:28
thorstsee if we're past the deadlock?19:28
kriskendkicked one off19:29
thorstseems like we're past the deadlock?19:29
kriskendlooks like it is going..19:29
kriskendso it is something in just continuing on with the existing set of deploys...weird19:30
thorstnot sure.  I kinda wonder if the two closes I had there screwed things up19:30
thorstso I'm going to push 4046 through19:31
thorstthen work on this GlanceReader side of things19:31
seroyerFYI: I saw 4 IncompleteRead error stacks in n-cpu.log.  Need to correlate them to VMs…19:32
seroyerI think that the last read from glance for the 10th VM happened at almost the exact same time as 3 of the IncompleteRead errors:  Wonder if n-cpu lost contact with glance?  Network hiccup?19:34
thorstseroyer: yeah...that's exactly what I was thinking19:36
thorstthese all happen right at the same time19:36
thorstbut you're on a private network....19:36
kriskendis there any extra debug we can turn on on the controller?19:37
thorstif your switch was something modern....I could hop on and look19:37
seroyer2016-10-13 14:10:37.752 fail /opt/pvm-rest/data/fileupload/673bd467-1296-4194-9180-97e356783c5d19:37
seroyer2016-10-13 14:10:38.927  read chunk 3 for VM 1019:37
adreznecStill waiting for that replacement switch thorst19:37
seroyer2016-10-13 14:10:38.929 fail /opt/pvm-rest/data/fileupload/b0d3cea3-5c83-4454-bf99-fc71cbf0d58819:37
seroyer2016-10-13 14:10:38.933 fail /opt/pvm-rest/data/fileupload/2fa70990-ddee-4eb9-a353-b32170934eee19:37
seroyer2016-10-13 14:10:38.936 fail /opt/pvm-rest/data/fileupload/18d04270-6cd4-4517-aeb8-7d7d22ac3e2d19:37
thorstI installed iftop and ifstat on your compute nodes19:37
thorstyou had some odd fluctuations of traffic during the downloads19:38
kriskendok so what is the next step?19:54
kriskendthorst: did you find anything in the the controller logs?19:55
thorstkriskend: not yet19:55
kriskendOk trying the new code out on another system that multiple deploys normally work fine on19:56
thorstefried: had a q on 421220:11
efriedthorst, yeah, motivation was actually so clbush could use it in pvmctl.20:11
efriedHence public.20:11
thorstlooked kinda like a silly method20:11
efriedYeah, it was frustrating us in pvmctl because clbush was doing a bunch of logic that .marshal was already doing - but not *all* of it.20:12
thorstthat method seems really weird20:12
thorstvery dependent on what 'val' is into it20:12
efriedSo he either had to keep his duplication, or use .marshal and then reverse engineer the result to get it *back* to the input list if it had been sanitized, kind of thing.20:13
thorstand in some cases it checks upper cased versus other checks doesn't look into casing20:13
efriedYeah, that's the whole point of the method - if 'val' is one of the consts (or a list containing just one of those), it sanitizes the const and returns it.  Otherwise, returns just the original list.20:13
efriedIn other words, "take something the user input as a list or string, and turn it into a value you can stuff into the .bld method blindly"20:14
thorstso if they type in 'all' you'll convert it to ALL20:14
efriedOh, actually, no, you're right20:15
efriedIf they send in 'all', we fail.  But if they use ['all'] we return ALL.  That's bad.  Will fix.20:15
thorstbut if the pass in ['all', 'none'], you return ['all', 'none']20:16
thorstwhich is weird as well.20:16
thorstmaybe doc it as well a little more20:16
efriedYeah, that'll get sanitized later by .bld in this case, but I was tempted to run the sanitizer in this method.  I think that's appropriate.20:16
efriedAlthough 'all' will still sanitize as a valid MAC address.  We don't check *too* thoroughly.20:16
thorstyou and your weird back to writing my funky_chunky_iter20:17
efriedooo, looking forward to reviewing that one.20:18
kriskendthorst: what log file would you expect to see interesting info for glance in?20:29
thorstkriskend: yeah, that's where I was looking, but there wasn't anything interesting in there.20:30
efriedthorst, can I validate a MAC address as always being ten hex digits, upper or lower, optionally separated by colons every two digits?20:30
kriskendIs there a way to turn on more interesting glance logs on the controller?20:31
efriedkriskend: in the conf (whichever one the glance CLI is using), you can set debug = True.  That'll get you DEBUG for the whole world.  Or you can use default_log_levels (see conf template comment for examples)20:33
kriskendhmm glance api conf already has debug = True20:38
kriskendi guess glance just doesn't log very interesting things20:41
openstackgerritDrew Thorstensen (thorst) proposed openstack/nova-powervm: Separate out glance read logic
thorstkriskend efried: ^^20:41
thorstthough I'm not sure I want that debug statement in that tight loop...20:41
kriskendSo with your patch, a VM hung in spawning on the system that multiple deploys use to always work on ...20:50
thorstI don't comprehend.20:51
thorstefried: FYI - nova meeting in ten20:51
efriedthorst, okay, remind me again then.20:52
efriedIn #openstack-meeting?20:52
*** edmondsw has quit IRC20:58
thorstand we're off.  adreznec FYI21:00
* adreznec hides under table21:01
openstackgerritTaylor Jakobson proposed openstack/nova-powervm: Create snapshot of VM
efriedthorst, aw shoot, I missed it.21:46
efriedthorst, looks like it went relatively well.  Even Still wasn't in total veto mode.21:50
thorstefried: best we could hope for21:50
efriedthorst, made improvements to 4212.  I like it better now.21:51
efriedaaaand I'm out.  Later y'all.21:52
