14:06:24 <edmondsw> #startmeeting PowerVM Driver Meeting
14:06:24 <openstack> Meeting started Tue Jul 17 14:06:24 2018 UTC and is due to finish in 60 minutes.  The chair is edmondsw. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:06:24 <chhagarw> edmondsw: while testing the iscsi changes on devstack, there were some issue could
14:06:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:06:28 <openstack> The meeting name has been set to 'powervm_driver_meeting'
14:06:47 <edmondsw> #link https://etherpad.openstack.org/p/powervm_driver_meeting_agenda
14:07:04 <edmondsw> ping efried gman-tx mdrabe mujahidali chhagarw
14:07:24 <esberglu> o/
14:07:26 <edmondsw> getting started a bit late... I got too caught up in reviewing the device passthrough spec
14:07:27 <efried> ō/
14:07:33 <efried> heh, good news
14:07:35 <edmondsw> I'm sure efried will mind that :)
14:07:48 <edmondsw> #topic In-Tree Driver
14:08:04 <edmondsw> #link https://etherpad.openstack.org/p/powervm-in-tree-todos
14:08:18 <edmondsw> I don't know of anything to discuss here... anyone else?
14:09:03 <edmondsw> I don't believe we've made any more progress on the TODOs there
14:09:12 <edmondsw> everything on hold as we focus on other priorities
14:09:21 <edmondsw> #topic Out-of-Tree Driver
14:09:35 <edmondsw> #link https://etherpad.openstack.org/p/powervm-oot-todos
14:10:20 <edmondsw> chhagarw I saw that you posted a new patch set for your iSCSI work... is that based on something you found with devstack?
14:10:33 <edmondsw> I think you were starting to say that right as I started this meeting
14:10:57 <edmondsw> in which case... good example of why we wanted devstack testing, and thank you for doing that
14:11:15 <chhagarw> yeah, while testing with devstack there are couple of issues found
14:11:33 <chhagarw> i am re-verifying on pvc now, will keep u posted
14:11:49 <edmondsw> as an aside on that... I am trying to spend spare cycles here and there improving our example devstack local.conf files in nova-powervm based on things I've been learning from chhagarw's environment and the CI
14:12:12 <edmondsw> chhagarw I think the last patch set I saw still had pep8 errors, so make sure you iron those out
14:12:43 <chhagarw> yeah i am updating
14:12:59 <edmondsw> I had a conversation with mdrabe about the MSP work. I hope he's getting to that here soon
14:13:18 <edmondsw> mdrabe any comments there?
14:13:56 <mdrabe> edmondsw: I'll be syncing the conf options, but the migration object in pvc will remain the same
14:14:13 <edmondsw> we can talk about pvc in other forums
14:15:08 <edmondsw> I added a section in our TODO etherpad about docs
14:15:27 <edmondsw> basically, readthedocs builds are failing since stephenfin's changes
14:16:23 <edmondsw> efried I also figured out how to register in readthedocs to be notified when a docs build fails... thought you might be interested to do the same
14:16:50 <efried> edmondsw: I would rather get us into docs.o.o.  Is that possible?
14:17:06 <edmondsw> I believe so, and it's on the TODO list
14:17:25 <edmondsw> in fact, I think that may be the only way to solve the current build issues, short of reverting some of what stephenfin did
14:17:30 <edmondsw> which I'd rather not do
14:18:00 <edmondsw> that will probably be the next thing I try
14:18:16 <edmondsw> that == moving to docs.o.o
14:18:56 <edmondsw> while we're talking about docs builds... I also noticed that one of our stable docs builds is broken, and all of our EOL tagged docs builds are broken
14:19:18 <edmondsw> lower priority, but also need to be fixed
14:19:19 <edmondsw> I hope we can also move them to docs.o.o but I'm not sure on that
14:19:30 <chhagarw> edmondsw: I want the updated code to reviewed once for the LPM change perspective
14:19:32 <efried> docs.o.o is latest only, I thought.
14:19:43 <edmondsw> efried no, it has older stuff too
14:20:27 <edmondsw> chhagarw I'll try to look later today
14:21:02 <edmondsw> anything else to discuss OOT?
14:21:44 <edmondsw> #topic Device Passthrough
14:21:55 <edmondsw> Eric has a couple things up for review:
14:22:14 <edmondsw> #link https://review.openstack.org/#/c/579359
14:22:26 <edmondsw> #link https://review.openstack.org/#/c/579289
14:22:42 <edmondsw> efried I've started commenting on both in parallel
14:22:54 <edmondsw> efried what do you want to add here?
14:23:12 <edmondsw> (i.e. I'm done talking, take it away)
14:23:50 <efried> Reshaper work is proceeding apace. Once that winds down, I'll probably be looking in nova (resource tracker and report client) to make sure nrp support is really there; as well as working through more of that series ^
14:24:23 <efried> mdrabe: We're counting on you to be our second core reviewer for this series, in case you didn't have enough to do.
14:25:10 <edmondsw> mdrabe I know you have other things to focus on atm... probably let me get my comments up today first, and then look at it
14:25:42 <mdrabe> sounds good
14:26:49 <edmondsw> efried anything else?
14:26:53 <efried> no
14:26:57 <chhagarw> mdrabe: if u can have a look https://review.openstack.org/#/c/576034/ want you to check if this change does not impact NPIV lpm
14:27:10 <edmondsw> #topic PowerVM CI
14:27:34 <edmondsw> #link https://etherpad.openstack.org/p/powervm_ci_todos
14:27:58 <edmondsw> we've been having some CI stability issues that mujahidali is working
14:28:03 <mdrabe> chhagarw: will do
14:28:08 <edmondsw> I've helped some there, as has esberglu
14:28:24 <esberglu> Here's what I think is going on with CI. The underlying systems are a mess. Filesystems are full, vios issues, etc.
14:28:30 <edmondsw> yes
14:28:31 <esberglu> Everything else is just a symptom of that
14:28:36 <edmondsw> agreed
14:28:49 <edmondsw> question is how best to fix it
14:28:53 <mujahidali> I looked into the the neo-21 and found that pvmctl was not working so restarted the neo followed by
14:28:53 <mujahidali> pvm-core
14:28:54 <mujahidali> pvm-res
14:28:54 <mujahidali> and after that pvmctl worked for neo-21
14:29:21 <mujahidali> I have cleared the other neo sytems as suggested by esberglu but still no luck, so, I decided to manually clear the ports. But it seems after cleaning them manually they are not coming back to active state
14:29:44 <esberglu> The ports are just going to keep failing to delete until the underlying issues are resolved
14:30:03 <esberglu> Are the /boot/ directories still full on some of the systems?
14:31:10 <mujahidali> managemnt nodes and most of the neo have only 30% filled /bbot/ directory
14:31:22 <mujahidali> */boot/
14:32:04 <mujahidali> esberglu: do we need to again re deploy the CI after the cleanup and neo restart ?
14:32:44 <esberglu> Have you been restarting neos? If you restart them you need to redeploy them
14:33:00 <esberglu> And yes if they are broken because of full filesystems they need to be redeployed
14:33:35 <edmondsw> esberglu is it possible to redeploy a single neo, or do they all have to be redeployed as a group?
14:33:43 <mujahidali> so, redploying the cloud_nodes or only management_nodes will do the work ?
14:34:31 <esberglu> You should deploy the compute_nodes and the management_nodes
14:34:40 <mujahidali> okay
14:35:05 <esberglu> You can redeploy single nodes using the --limit command, I've given mujahidali instructions on that before, but let me know if you need help with that
14:35:25 <esberglu> At this point it's probably better to redeploy all of them though
14:35:26 <mujahidali> sure
14:36:05 <edmondsw> mujahidali have we fixed the VIOS issues?
14:36:27 <edmondsw> and you said "most" of the neos have only 30% filled in /boot... what about the others?
14:36:40 <mujahidali> for neo-26 and neo-30 ??
14:36:45 <edmondsw> yes
14:36:57 <openstackgerrit> Eric Fried proposed openstack/networking-powervm master: Match neutron's version of hacking, flake8 ignores  https://review.openstack.org/582686
14:36:57 <openstackgerrit> Eric Fried proposed openstack/networking-powervm master: Use tox 3.1.1 and basepython fix  https://review.openstack.org/582404
14:36:59 <edmondsw> I want to address as much as we can before you redeploy to increase our chances of that fixing things
14:37:32 <mujahidali> I am not getting what exactly gone wrong with neo-26 and 30
14:38:13 <edmondsw> ok, let's try to look at that together after this meeting, before you redeploy
14:38:19 <mujahidali> they(neo-26 and 30) are having sufficient /boot/ space as well
14:38:43 <mujahidali> edmondsw: sure
14:38:52 <edmondsw> anything else to discuss here?
14:39:12 <esberglu> Yeah I have some stuff
14:39:27 <esberglu> mujahidali: Have you created all of the zuul merger nodes?
14:39:37 <esberglu> So that I can stop maintaining mine soon?
14:40:23 <mujahidali> I want to try them with today's deployment for prod
14:41:16 <mujahidali> so let me deploy the prod with the new zuul mergers and if all went right then you can free yours
14:42:30 <esberglu> mujahidali: Please propose a patch with the changes
14:42:47 <mujahidali> sure
14:43:11 <esberglu> mujahidali: edmondsw: What's the status on vSCSI CI for stable branches? I think last I heard ocata was still broken there. I gave some suggestions
14:43:21 <esberglu> Is it still broken with those?
14:43:43 <esberglu> Is it worth moving forward with vSCSI stable CI for pike and queens only and skipping ocata for now?
14:43:55 <edmondsw> I thought we were going to split that commit into 1) pike and newer 2) ocata so that we could go ahead and merge #1
14:44:01 <edmondsw> but I haven't seen that done yet
14:44:19 <mujahidali> I am able to stack it now with changes esberglu suggested
14:44:24 <edmondsw> yay!
14:44:38 <mujahidali> but there are 3 tempest failure
14:45:03 <edmondsw> mujahidali ping me the details after the meeting
14:45:09 <edmondsw> and we can work through that
14:45:22 <edmondsw> after we work through the other thing
14:45:57 <mujahidali> okay
14:46:44 <esberglu> edmondsw: mujahidali: There were a bunch of additional neo systems that we had slated for the CI pool. Did those ever get set up?
14:46:57 <mujahidali> no
14:47:31 <edmondsw> because we've been focused on other things, or is there another reason?
14:47:54 <mujahidali> we were hitting CI breaking very frequently so, didn't get a chance to a look at it.
14:48:47 <edmondsw> I think that's understandable... keeping the CI running takes priority
14:49:02 <esberglu> Last thing on my list was multinode CI. Any questions for me there mujahidali? I'm guessing not much work has happened there either with the CI stability issues
14:49:57 <mujahidali> I redeployed the staging CI using the changes suggested by esberglu for multinode
14:51:03 <edmondsw> and?
14:52:01 <mujahidali> the jenkins job failed. can I paste the log link here
14:52:24 <edmondsw> no, that's another thing we can talk about in slack
14:52:30 <mujahidali> sure
14:52:52 <edmondsw> I think that's it for CI?
14:53:14 <esberglu> All for me
14:53:38 <edmondsw> #topic Open Discussion
14:54:02 <mujahidali> I will be OOO next monday.
14:54:07 <edmondsw> I meant to bring this up when we were talking about OOT, but efried has fixed our specs so they build now. Thanks efried
14:54:17 <edmondsw> mujahidali got it, tx
14:55:18 <edmondsw> #endmeeting