Tuesday, 2017-01-24

openstackgerritzhangyanxian proposed openstack/networking-powervm: Fix typos in sea_agent.py  https://review.openstack.org/424394
openstackgerritzhangyanxian proposed openstack/networking-powervm: Fix typos in sea_agent.py  https://review.openstack.org/42439400:54
thorst_efried: in our driver mtg, I'd like to talk for a few minutes on CI stability around VM power off.13:53
efriedWe starting that now?14:02
efriedadreznec: thorst_: (wangqwsh esberglu) we having a meeting?14:04
*** wangqwsh has joined #openstack-powervm14:05
adreznecStill not seeing esberglu efried. Is he out or just running late?14:07
efriedNot online.  Slacked him.14:08
*** mdrabe has joined #openstack-powervm14:08
*** apearson has joined #openstack-powervm14:08
adreznecHmm ok14:08
*** tblakes has joined #openstack-powervm14:09
efriedadreznec: thorst_: wangqwsh: He's on his way in now.  Forgot his badge.14:14
adreznecWell we can start without him and back up to CI when he arrives14:15
adreznec#startmeeting powervm_drver_meeting14:15
openstackMeeting started Tue Jan 24 14:15:12 2017 UTC and is due to finish in 60 minutes.  The chair is adreznec. Information about MeetBot at http://wiki.debian.org/MeetBot.14:15
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:15
openstackThe meeting name has been set to 'powervm_drver_meeting'14:15
adreznec#topic In-tree driver status14:15
adreznecLets start here. I'll turn things over to you efried14:15
*** esberglu has joined #openstack-powervm14:16
esbergluHey guys sorry I'm late, forgot my badge14:17
efriedI'm slowly working my way through the early in-tree change sets.14:17
adreznecnp esberglu, just fired up the meeting. talking in-tree driver status first14:17
efriedFirst one is for sure ready for wider review; we're just waiting for in-tree CI before we can put pressure on the cores to review.14:18
efriedIf we can get the CI up today or tomorrow, there's a chance we can get mriedem to do a review before the nova meeting on Thursday.14:18
efriedWhat's the o-3 date?  Is it Thursday or Friday?14:18
adreznecefried: It's up to projects a bit. Most will be Thursday, but Jan 23-27 is the official range14:19
efriedWell, okay, this is nova we're talking about.14:19
efriedConfirmed Thursday.14:20
efriedSooo... we pretty much miss ocata if we don't have the CI up today.  Cause I don't think we get the change merged on the first pass.14:20
thorst_efried: I think we've missed Ocata.  :-)14:20
thorst_for the intree.14:20
*** smatzek_ has quit IRC14:21
efriedLet's work as if there's still a chance.14:21
efriedRemainder of driver status is piddly details.  I can go more in depth if you want, but we should probably spend the time on more important stuff.14:21
efried*in-tree driver status14:22
thorst_I have oot driver talk, but I suspect that's a different part of meeting14:22
efriedyuh, suggest waiting til after we talk in-tree CI.14:22
efriedAnything else before we move on to that?14:22
adreznecNothing here14:22
adreznec#topic In-tree driver CI status14:23
adreznecesberglu, the floor is yours14:23
esbergluThe in-tree driver is failing to stack. "n-cpu service is not running" for some reason14:23
esbergluProblem is that the logs are failing to copy14:23
esbergluSo i'm running one manually14:23
esbergluTo see what the deal is14:23
adreznecesberglu: That's weird. Is it just not getting far enough to transfer somehow?14:24
adreznecOr is there actually an scp failure?14:24
*** apearson has quit IRC14:24
esbergluNah it's actually a SCP failure. Which is weird because I didn't change anything for SCP14:25
esbergluTrying to create /srv/static/logs/5914:25
esbergluIt fails trying to create thta14:25
esbergluthe log server isn't full or anything though14:25
esbergluI will let you guys know the results of the manual run when it finishes14:26
esbergluThe other thing I had14:26
adreznecHmm odd. Nothing in that flow should have changed unless something is broken in the build variables somehow14:26
esbergluThere's a "test connection" thing in the configure system, and it is connecting to the log server fine14:27
esberglu3 scripts run as part of the CI setup14:27
esberglu1) prepare_node_powervm14:27
esberglu2) ready_node_powervm14:27
esberglu3) prep_devstack14:27
esbergluPreviously we would install the patched (develop) pypowervm in prepare_node_powervm14:27
esbergluSince we are now using 2 different versions of pypowervm ( for in tree, develop for oot) I moved the installation to prep_devstack14:27
esbergluThe problem is that we need the patched pypowervm to be installed for the ready_node script to work14:27
esbergluSo I was thinking just install the patched develop in prepare_node_powervm14:28
esbergluThen if it is in tree, just overwrite with the patched
efriedAs long as is in place before the nova compute process starts, it shouldn't matter when we do it.14:29
efriedHeck, I would even be okay skipping that wrinkle for now and just continuing to use develop14:29
adreznecefried: I think it does14:29
efriedThey're not different enough that it's going to cause failures.14:29
adreznecBecause I think we use pvmctl before that14:29
adreznecTo do node setup14:29
efriedand we can focus on getting things working first, then worry about that version switch later.14:29
esbergluYeah that was the problem. We don't know if it's in or out of tree until (3) but we need pvmctl by (2)14:30
adreznecesberglu: we could just always install develop for step 214:30
adreznecthen switch it out for the "right" version in 3 if needed?14:30
adreznecFragments it a bit, but... meh14:30
esbergluYeah. That's exactly how I have it set up right now. Seems to be working, just wanted to make sure I wasn't missing something14:30
efriedoh, okay.14:31
efriedthorst_ adreznec - can you think of an easy way to have e.g. Adapter() init log the pypowervm version?14:31
adreznecThere are other options like bundling pypowervm/pvmctl into a venv and shipping that whole thing so pvmctl has its own pypowervm to use always or something14:31
adreznecBut they're more work14:32
efriedI don't know where the version numbers are stored.  I'm sure it involves pbr or something.14:32
thorst_efried: no idea...14:32
adreznecWe'd have to make a call off to pbr's version_string method to get that14:34
efriedIn [12]: pbr.version.VersionInfo('pypowervm').release_string()14:34
efriedOut[12]: '1.0.0.dev4'14:34
thorst_could we just log the version at the end of the CI job?14:34
thorst_and call it a day there?14:34
adreznecI wonder if we should stop using pbr for pypowervm though14:34
adreznecpbr only really works well with semver14:35
efriedWell, I wanted to have a way to be sure the compute process was started with the correct version.14:35
adreznecAs you can see there14:35
adreznecSince the version probably isn't really 1.0.0.dev414:35
adreznecbut or
esbergluWe would have to log it before it gets patched14:35
efriedwouldn't think so14:36
esbergluOnce it's patched the version becomes 1.0.1devxxx14:36
esbergluI'm pretty sure14:36
efriedIf this is going to be more than a five-minute thing, then never mind; but it would be useful in the long run.14:38
efriedFor right now, like I say, I would be okay moving forward even if the compute process is still using develop.14:38
efriedI got lost.  What's the next step here?  Seeing how the local run goes, and then nailing down the scp thing?14:39
esberglu#action esberglu: Finish manual in tree run and update with results14:40
esberglu#action esberglu: Figure out why logs aren't being copied14:40
efriedesberglu: would having another body help move this along any faster, or are we bottlenecked?14:41
efriedI would be volunteering someone like adreznec who knows this stuff ;-)14:41
esbergluIf someone wants to help with the SCP thing. Nothing to do for the manual run but wait14:41
efriedk.  thorst_ is that in your wheelhouse?14:41
efriedDo you have the expertise and bandwidth to help esberglu figure out this SCP boggle?14:43
thorst_ooo, I do not.14:43
thorst_survival mode atm14:43
efriedCause I know I don't.14:43
adreznecefried: Not sure yet, still bogged down right now14:43
efriedOkay, if there's nobody with the technical chops, I'd be happy to be a sounding board and additional googler.14:44
adreznecWill depend on how things shake out with meetings really14:44
efried#action efried to help esberglu with SCP boggle, for whatever that's worth.14:44
efriedesberglu: is there anything else you can see on the horizon that will need to be addressed?  Something we might be able to get a head start on if we're stuck waiting for whatever?14:45
efriedThe big obvious thing is paring down the test list - but we don't really know where to start with that.  However, setting up the infrastructure to use a whitelist?14:45
*** k0da has joined #openstack-powervm14:46
esbergluI already know how we should do that14:46
esbergluThis is the conf we use for out of tree14:46
esbergluWe need to make a second conf for in tree14:46
thorst_well, its going to be a whitelist14:47
esbergluAnd then we set the BASE_TEST_REGEX to include all the tests we want14:47
thorst_so its only supposed to be the tests we want14:47
thorst_ahh, nm...I see14:47
esbergluYep. The BASE_TEST_REGEX for out of tree includes all the tests14:47
efriedoh, so it's already whitelisting.  It's just really inclusive.14:47
thorst_that's rough.14:47
esbergluYeah the "whitelist" for out of tree is all tests, then it gets reduced by the skip_list14:48
thorst_I'm going to nope myself out of anything with regex14:48
efriedSo the BASE_TEST_REGEX is going to be a regex with (id|id|id|id.....)14:48
thorst_I find regex to be an awful creation14:48
efriedahh, thorst_, you don't understand the beauty of regex.14:48
efriedSokay, I'm your regex guy.14:48
thorst_efried: you are correct, I find it flawed and awful14:49
thorst_but that's my definition of 'beauty'14:49
thorst_can't be awful14:49
esbergluefried: It we be easier to use test names, then we could include groups of tests with one regex. But it was recommended to use to use ID's before14:49
efriedYeah, however, I don't know that we really want to handle the whitelist with a regex like that.  There's probably another (better) way to do it.14:49
efriedSo - let me take another look at the os_ci_tempest.sh and see what I can figure out.  Unless esberglu has already done that?14:50
efriedI forget, which project holds the real one of those?  neo-os-ci or powervm-ci?14:50
efried#action efried to investigate whitelisting14:52
*** smatzek has joined #openstack-powervm14:53
adreznecAll right14:53
adreznecAnything else on in-tree CI?14:53
adreznecOk, I know thorst_ had discussion on out-of-tree14:53
adreznec#topic Out-of-tree driver discussion14:54
thorst_yeah, so our oot CI is kinda flaking out again14:54
thorst_I'm seeing several patches failing...14:54
thorst_I've at least root caused one of em.14:54
thorst_that's the error.  Basically it is a non-force immediate power off.14:54
thorst_and its just hanging14:54
thorst_I think we've hit this a few times now...so we should get it fixed.14:55
thorst_I opened a bug a long time ago around this14:55
openstackLaunchpad bug 1562117 in nova-powervm "power-off times are not adhered to" [Low,New] - Assigned to Lauren Taylor (lmtaylor)14:55
thorst_I think we need some attention put on it now.  Anyone have cycles to explore that fix?14:55
thorst_I can also check with lmtaylor on it, but she's had it for a while and hasn't updated it recently.14:56
efriedsqueaky wheel14:56
thorst_well, I'll work on it as I free up14:56
thorst_but it's impacting CI.  Sigh14:56
thorst_that was about it14:57
thorst_I suspect we'll argue in the review14:57
thorst_so awareness now, this one will be a weird review...so pay attention to the review.14:57
efriedthorst_: is the problem that you think we should be timing out faster?14:58
*** jwcroppe has joined #openstack-powervm15:00
thorst_OpenStack gives us values for timeout and retry15:00
thorst_what those values mean...is open to interpretation15:00
thorst_does a 0 mean immediately or wait forevs15:00
thorst_I interpret it as 'immediate'15:00
efriedI see.15:02
efriedShould prolly go look at how libvirt et al interpret those values.15:02
efriedlibvirt agrees with thorst_15:03
thorst_well then, its easy15:03
efriedtimeout != 0 => "gracefully"15:03
thorst_anyway, I'll get on it15:03
thorst_soon enough, cause its blocking my other changes (kinda)15:03
thorst_that's all I had for OOT.  Big thanks for reviews on the fileio thing15:04
thorst_not sure if we can get that into ocata...would've been nice15:04
efriedk, if you find you don't have time, I can take over.15:04
efried#action thorst_ https://bugs.launchpad.net/nova-powervm/+bug/1562117 - efried to help if needed.15:04
openstackLaunchpad bug 1562117 in nova-powervm "power-off times are not adhered to" [Low,New] - Assigned to Lauren Taylor (lmtaylor)15:04
*** tlian has joined #openstack-powervm15:04
adreznecSo I know we're over15:04
adreznec#topic Open floor15:05
adreznecAnything else?15:05
* thorst_ dances on open floor15:05
efriednot from me, but esberglu stick around so we can talk about the whitelist.15:05
openstackMeeting ended Tue Jan 24 15:05:40 2017 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:05
openstackMinutes:        http://eavesdrop.openstack.org/meetings/powervm_drver_meeting/2017/powervm_drver_meeting.2017-01-24-14.15.html15:05
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/powervm_drver_meeting/2017/powervm_drver_meeting.2017-01-24-14.15.txt15:05
openstackLog:            http://eavesdrop.openstack.org/meetings/powervm_drver_meeting/2017/powervm_drver_meeting.2017-01-24-14.15.log.html15:05
esbergluefried: thorst_: adreznec: Turns out I was looking at the wrong run in jenkins (one of the other in tree patches)15:32
esbergluThe actual issue seems to be a tempest conf thing15:32
esbergluI'll PM you guys the link to the run15:32
efriedesberglu: Have we merged the thing that converts the skip list from name to ID yet?15:42
efriedLooks like not.  Has it been tested live yet?15:43
efriedCause I don't actually see how it could work, given the code I'm looking at in os_ci_tempest.sh15:43
esbergluNo I haven't had a chance to test it yet. I would have to put changes into the CI scripts to use that patch15:44
esbergluAnd there are already enough changes in those scripts, I didn't want it to get too messy15:44
efriedokay.  So yeah, the way we parse the skip list is to remove the square-bracketed part and match up on the base name.15:45
efriedSo work needs to be done in os_ci_tempest.sh before that'll work.15:45
esbergluCould we just move the comment with the name to a separate line?15:45
esbergluI think I may have found the source of the stack issue for in tree15:49
esbergluThe out of tree driver has this in local.conf15:49
esbergluBut the in-tree has15:49
esbergluHmm okay15:54
esbergluI might know what it is15:54
esbergluthe /src/static/logs directory is owned by jenkins15:55
efriedis that different than in oot?15:55
esberglubut all of the /srv/static/logs/# directories are owned by powervmci (which is the username SCP uses)15:56
esbergluI wonder if we just need to change the ownership of the logs dir to powervmci15:56
esbergluI'll give it a try15:58
adreznecThat's... bizarre16:12
efriedbut cool16:13
efriedbut bizarre16:13
adreznecWonder how it was creating subdirectories fine before16:13
esbergluNo idea16:13
esbergluI redeployed the log server so maybe that changed something. But idk why it would, there haven't been any changes to the log server playbook that I know of16:14
adreznecIs this prod or staging?16:14
adreznecMaybe something to do with the 16.04 upgrade then?16:15
esbergluYeah maybe. It's also being used for the cache / dns stuff now. Again idk how that would affect it16:16
esbergluWho knows why, but at least it's working16:16
esbergluadreznec: thorst_: efried: Manual run got through the stack with the updated image url16:38
esbergluJust kicked off os_ci_tempest. Lets see what happens16:38
esbergluJust running the test list from OOT, it might take a while if there are a bunch of timeouts / failures16:41
esbergluWell mostly timeouts, we know there will be tons of failures16:41
esbergluResults are in17:21
esbergluTotal tests:    114617:21
esbergluPassed tests:    94717:21
esbergluFailed tests:     1617:21
esbergluSkipped tests:   18317:21
esbergluWtf it used the wrong local.conf17:24
esbergluI must have messed something up running it manually17:30
thorst_lol, we still passed 947 tests?17:48
esbergluIt used the wrong local.conf17:49
esbergluSo it stacked with nova-powervm enabled17:49
adreznecI thought our driver was just super awesome17:50
efriedthorst_: ack18:00
thorst_gerrit seems to be mad and won't let me add any reviewers18:01
efriedthorst_: reviewed18:09
efriedthorst_: it let me add reviewers.18:10
thorst_of course you crap on it quick18:10
efriedMust just be mad at you.18:10
thorst_yeah, I can now too18:10
efriedI didn't crap on it.18:10
efriedSprinkled a little fairy dust, that's all.18:10
efriedthorst_: Not gonna implement retry yet?18:11
thorst_read the retry18:13
thorst_it isn't a retry18:13
thorst_its a retry_interval18:13
thorst_it is actually a polling period to say 'should i check if the power off completed yet, how about now'18:13
thorst_it is essentially what our pypowervm job framework does.18:14
thorst_I don't see much value in implementing that one18:14
openstackgerritDrew Thorstensen (thorst) proposed openstack/nova-powervm: Fix power off timings  https://review.openstack.org/42478018:14
efriedthorst_: dig18:15
efriedthorst_: LGTM.  Want more reviews before you push it?18:17
thorst_lets see what CI says at least18:21
esbergluMaking progress. Stacked with the right local.conf this time. Missing the18:31
esbergluConfig option in the local.conf which causes script failures after the stack18:31
thorst_esberglu: we're still getting a lot of one off failures18:40
thorst_that to me, don't look related to the actual change set18:40
thorst_would you agree?18:40
*** nbante has quit IRC18:40
thorst_alright, we will need some focus there soon to root out what our code bugs are18:42
thorst_hopefully after this power off thing merges it goes down18:42
thorst_but I bet we have a few latent bugs18:42
thorst_hammering it with this volume has been awesom18:42
thorst_efried: I'm going to push this thing through19:58
thorst_then propose back to newton19:58
thorst_also, csky gave his +1 here that you wanted: https://review.openstack.org/#/c/424290/19:59
openstackgerritMerged openstack/nova-powervm: Fix power off timings  https://review.openstack.org/42478020:08
thorst_efried: https://review.openstack.org/#/c/424835/20:37
openstackgerritMerged openstack/nova-powervm: Fix hostname & initiator for volume drivers  https://review.openstack.org/42429020:45
efriedadreznec: thorst_: esberglu: Where's a recent AIO local.conf?20:47
efried...for OOT20:47
esbergluThe one we use for CI is in neo-os-ci/ci-ansible/roles/ci-management/templates/scripts/local.conf.aio20:48
*** edmondsw has quit IRC23:38
