Tuesday, 2018-09-04

fried_riceedmondsw:  <== anything green here, I reckon we can nix from our CI runs :)01:27
fried_ricethat's 707 tests.01:27
*** fried_rice is now known as efried01:27
*** tonyb has quit IRC08:50
*** tonyb has joined #openstack-powervm09:53
efriededmondsw: buzz when avail12:08
edmondswefried hey. Was just looking at the link above12:10
efriedThe only thing those 707 tests prove is that the compute service doesn't crash. Which I'm sure is well proven by the other tests :)12:11
edmondswyep, good stuff12:12
edmondswyou gonna take a stab at any CI changes for this?12:13
edmondswI really wish there was a good way to parse this output programmatically... gonna be a pain to copy-paste manually12:14
edmondswefried and note that some of the green is stuff we're already skipping, rather than something we passed12:15
edmondswbut wow there is a lot of "pass" in there12:17
edmondswefried I wonder if we could work with the nova folks to mark tests relevant to compute, and then have a tempest setting that indicates you only want to run those12:20
edmondswrather than call everything out individually12:20
edmondswsince that would be relevant to a whole bunch of driver CIs, not just ours12:20
efriedThat would indeed be pretty neat.12:20
edmondswyou want to take that up?12:21
efriedanyway, to answer your question, I think we should have our CI guy work on any white/black list changes12:21
efriedThe other thing sounds like... a lot of work. Both to champion and to execute on.12:22
efriedtbh I don't see anyone being willing to pick it up.12:22
efriedI suppose I can mention it in IRC and see if anyone bites.12:23
efriedor maybe in the ML.12:23
edmondswI just hate what you've done here to go to waste12:23
edmondswI guess we can always rerun it later if/when someone ever hast time to do anything12:24
efriedoh, it consumed all of 15 minutes of my life12:24
efriednot a big deal12:24
efriedDefinitely telling12:24
efriedand I reckon it would be worth acting on in terms of making our CI smaller12:24
edmondswok, cool... saw a bunch of different patch sets12:24
efriedyeah, started with basically an empty driver and had to add a couple of basics back in to get n-cpu not to crash. But none of it was hard or time-consuming.12:25
efried...But the bigger picture of marking tests compute-only so we (and other 3rd-party CIs) could automate it...12:25
efriedI'll throw something out on the ML and see if anyone responds.12:26
edmondswnote that it wouldn't just be for 3rd party drivers... e.g. there are VMware CIs, even though they're in-tree12:26
efriednot sure where that split would be defined.12:28
edmondsw(side note: it kills me that they are fully in-tree and we're not despite them being way less open and less OpenStacky than anything Power does...)12:28
efriedmeh, we're getting there.12:28
edmondswyeah, end rant12:29
edmondsw(after several years)12:29
edmondswok, now down12:29
efriededmondsw: Where would I point if I wanted to reference "the PowerVM CI"?12:31
efriedwe have like a wiki page or something yah?12:31
edmondswif you point to the test results you just showed me, be prepared for someone to complain about some of the things we're already skipping12:32
edmondswI'll find you a link later, in meeting12:32
efriedAs I'm writing this, I'm kinda thinking of counterarguments.12:35
efriedLike, we're still running all that other shit on a Power box. So if someone else (n-cond, cinder, whoever) makes a change that breaks power, that would be how we would find out about it.12:36
efriededmondsw: Here's what I've got: http://paste.openstack.org/raw/729403/12:51
edmondswefried was it really 707 passed, or 707 = pass+skip ?13:10
efried61 failed, 289 skipped (see summary at the top)13:11
edmondswyep, cool13:12
edmondswit does not reflect well on us that only 61 failed :(13:12
edmondswI probably wouldn't draw attention to that fact13:12
efriedwhat do you mean?13:13
efriedI guess the proportion of failed:skipped doesn't give a warm fuzzy.13:14
efriedbut I think that's kind of the point. Let's say some other CI is skipping 100 tests. They think they're only skipping 10% of the tests, but really they're skipping half of the (important/relevant) ones.13:14
efriedbut, what, you think I should just stfu in case Dan decides to punish us?13:15
edmondswI was just thinking change the last sentence of your first paragraph to "The results [2] show that 707 tests still pass."13:19
efriedokay, can do.13:20
efriedotherwise, good to send?13:20
edmondswI think so13:21
*** mujahidali has joined #openstack-powervm13:49
edmondsw#startmeeting PowerVM Driver Meeting14:01
openstackMeeting started Tue Sep  4 14:01:23 2018 UTC and is due to finish in 60 minutes.  The chair is edmondsw. Information about MeetBot at http://wiki.debian.org/MeetBot.14:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:01
*** openstack changes topic to " (Meeting topic: PowerVM Driver Meeting)"14:01
openstackThe meeting name has been set to 'powervm_driver_meeting'14:01
*** esberglu has joined #openstack-powervm14:01
edmondsw#link https://etherpad.openstack.org/p/powervm_driver_meeting_agenda14:01
edmondsw#topic In-Tree Driver14:02
*** openstack changes topic to "In-Tree Driver (Meeting topic: PowerVM Driver Meeting)"14:02
edmondswI was out last week, so if anything of significance happened here I missed it14:02
edmondswefried anything of note?14:03
edmondswalrighty then14:04
edmondsw#topic Out-of-Tree driver14:05
*** openstack changes topic to "Out-of-Tree driver (Meeting topic: PowerVM Driver Meeting)"14:05
edmondswanything here?14:05
efriedI'll wait for other topics14:06
edmondsw#topic Device Passthrough14:06
*** openstack changes topic to "Device Passthrough (Meeting topic: PowerVM Driver Meeting)"14:06
edmondswefried ^14:07
efriedso things are getting confusing here.14:07
efriedBear with me for a bit14:07
efriedI proposed the nova-powervm spec, and have a bunch of code up for it.14:08
efriedkosamara (CERN) noticed and proposed essentially the same spec into nova.14:08
efriedIt has been getting reviews.14:08
efriedRecently the elephant in the room was brought up, which is: How does this play with cyborg?14:08
efriedAnd at this point I'm... not exactly stuck, but I haven't yet decided how we're going to answer that, either from a nova perspective or from a nova-powervm perspective.14:09
efriedI *think* in nova we're going to need to essentially abandon the idea of doing anything independent of cyborg.14:10
efriedAnd what that's going to mean in practical terms - despite the cyborg team's best intentions (which are IMO and based on my experience, naïve) - is that there's no way in hell we're going to have anything workable in nova in the Stein timeframe.14:10
efriedso what we need to figure out is what that means for nova-powervm.14:11
efriedbecause we want^Wneed to have something workable in Stein14:11
efriedSo I think this means we're going to need to move forward with something like the plan we started on for nova-powervm. And then around the Train release, we'll need to do another big shift to get things working with cyborg. Or... not.14:12
efrieddepending how far nova gets with that, and how far we're willing to diverge and/or stay separated from what they're doing.14:13
edmondswback up to why nova would abandon the idea of doing anything independent of cyborg14:13
efriedIt would be like saying nova is going to add some kind of new volume support without involving cinder.14:13
efriedor network support without involving neutron.14:14
edmondswis it, though? I don't think so14:14
efriedWell, swhy I said *think*.14:14
edmondswmaybe we need to start with what you mean by independent14:14
efriedRight now nova has the legacy pci passthrough subsystem.14:15
efriedEverybody hates it and agrees it needs to diaf.14:15
efriedand we've all agreed that whatever solution comes next ought to involve placement14:15
efriedAnd in Denver last year (Queens ptg) we started talking about generic device management, inventory/whitelist via yaml, ... all the stuff we've been working on putting together right now.14:16
efriedBut then at some point during Q/R, the cyborg project materialized14:16
efriedand now, device management is recognized as being their bailiwick.14:16
efriedin addition to involving placement14:16
efriedany device management work is also seen as needing to involve cyborg.14:17
edmondswisn't cyborg more about device programming than device management?14:17
efriedLike, I don't see nova deciding it's okay to implement a nova+placement solution in Stein only to have to rework everything to make it a nova+placement+cyborg solution in Train.14:17
efriedand furthermore, the aforementioned naïveté will have us working as if n+p+c could become a reality in Stein14:18
efriedeven though that's IMO a pipe dream.14:18
edmondswand isn't cyborg specific to accelerators, with no intention to have anything to do with other types of devices?14:19
efriededmondsw: To answer your question, the practical *value add* of cyborg, in the short/middle term, is programming accelerators. But their scope is definitely defined to encompass device management in general.14:19
edmondswnot according to them14:19
edmondsw" to provide a general purpose management framework for acceleration resources"14:20
edmondswand I've never heard them mention anything more general14:20
efriedheard where?14:21
efriedIn their meetings? Specs? IRC? Dublin?14:21
efriedThat wiki page hasn't had substantive updates since last November; I wouldn't rely on it as being a current/accurate description of their project's scope.14:22
edmondswyes. I won't claim intimate familiarity with what they're doing, but I've talked to them a few times, read some things on the ML, etc... could certainly have missed this, but it would be a surprise14:22
efriedIt should also be noted that, until recently, there was nobody on that team with a fabulous grasp of English.14:22
edmondswso where are they defining their scope?14:23
efriedWell, here's an example of a spec: http://logs.openstack.org/38/577438/11/check/openstack-tox-docs/cc6ea12/html/specs/rocky/approved/compute-node.html14:24
edmondswagain, that says "for accelerators"14:24
efriedThe bulk of the first section is boilerplate that they're including in all of their specs, and it pretty well describes what they're doing.14:24
efriedOkay, what do you think an accelerator is?14:24
efriedIt certainly encompasses GPUs, which is what we care about right now.14:25
edmondswok, do you think all devices are accelerators?14:25
edmondsw(the correct answer is no)14:25
edmondswe.g. infiniband adapter14:25
efriedof course; not sure how that matters in this context though.14:25
edmondswso what happens to them?14:26
edmondswif the solution must involve cyborg, and cyborg won't have anything to do with infiniband, does the solution not cover infiniband?14:26
edmondswsounds like it14:26
efriedWell, cyborg is going to manage them also. And SR-IOV etc.14:26
edmondswthen they need to state that they're broadening their scope14:27
efriedBut I guess that's been an underlying assumption in the background of discussions rather than explicitly stated in a spec or anything.14:27
edmondswto include more than just accelarators14:27
efriedokay, cool, you should tell them that.14:27
efriedstill not sure how this gets us further along.14:28
efriedlet's say hypothetically that they'll never manage infiniband or SR-IOV.14:28
efriedHow does that help us get non-cyborg management of GPUs into nova in Stein?14:28
edmondswwrong question14:29
edmondswforget schedules until we figure out whether what the right path is14:29
efriedHow does that help us get non-cyborg management of GPUs into nova?14:30
efriedis that the right question?14:30
edmondswthat's a good topic for conversation including the nova and cyborg teams14:31
edmondswI'm just saying that it seems we need to step back and look at this more generally14:32
efriedright; and last week someone (I don't remember who) asked cyborg to put up a nova-specs doc to describe what they think the *nova* side of things is going to look like.14:32
efriedso I think we'll know more based on the outcome of that14:33
efriedwhatever happens next week.14:33
efriedbtw, who all is going to Denver?14:33
edmondswright... to what extent does cyborg need to be involved when the device is an accelerator? Would need the cyborg guys to chime in there14:33
efriedyou mean when the device is *not* an accelerator?14:33
edmondswand then how do we handle non-accelerators that cyborg doesn't care about?14:34
efriedor was the emphasis on *need*?14:34
edmondswso that we can cover both accelerators and non-accelerators in whatever design is worked out14:34
edmondswI'm fine including cyborg to the extent that makes sense. I just want a plan that covers more than accelerators14:34
edmondswand then when the long-term plan is laid out, we can figure out how to incrementally get there while meeting business objectives along the way14:34
edmondswefried to your questions: 1) let's talk PTG in open discussion, 2) no, 3) no14:35
edmondswrestating... we need the cyborg guys to chime in on the extent to which they need to be involved when the device is an accelerator as an important thing for nova to understand while designing a solution that is not accelerator-specific but does support accelerators14:37
edmondswi.e. cyborg definitely needs to be involved here, but not required14:38
efried"here" where?14:38
efriedAre you speaking for nova?14:38
edmondswcyborg definitely needs to be involved in device attachment, but can't be required for devices that cyborg doesn't have anything to do with14:39
efriedokay, and I'm saying I'm pretty sure, long-term, "devices that cyborg doesn't have anything to do with" == {}14:40
efriedbut again, we should be attempting to get clarity on that soon, esp next week.14:41
efriedMy plan for M/T is to be hanging out in the cyborg room.14:42
edmondsw"let's say hypothetically that they'll never manage infiniband or SR-IOV."14:42
efriedyeah; I don't think that's valid, just hypothetical.14:42
edmondswsounds good... we definitely need clarity there14:43
edmondswtable this until that's figured out?14:43
efriedYup. You'll notice I opened up with how I'm confused and unsure and needing discussion/clarity.14:44
edmondswyup. I hope this helped? I'm at least glad to understand what's going on14:45
efriedI made some predictions based on what I've read, discussed, but also sensed and felt as (apparently) purely undercurrents.14:45
efriedWell, no, I don't feel further along on any of that I'm afraid.14:45
efriedbut that's okay, I didn't really expect to.14:46
efriedI was just airing what's been going on (in my head and elsewhere) on the topic.14:46
efriedto get/keep y'all informed.14:46
efriednot totally sure what, if any, action I should be taking this week on the device passthrough front.14:47
efriedother than continuing to review cyborg specs.14:48
efriedI wish Sundar would spend more time in IRC.14:48
efriedso I could like ask him some of these questions.14:48
edmondswfind out whether cyborg has any intention of handling devices that are not accelerators?14:48
edmondswthat seems to be the key14:49
edmondswif you're right and they will handle all devices, then yeah, I totally get why nova would make them integral to the design and we'll have to then figure out how we deal with that in Stein14:50
edmondswbut have to get that answered first so we're not designing based on the wrong assumptions14:51
edmondsw#topic PowerVM CI14:52
*** openstack changes topic to "PowerVM CI (Meeting topic: PowerVM Driver Meeting)"14:52
edmondswmujahidali ^14:52
edmondswlink: http://ci-watch.tintri.com/project?project=nova14:52
mujahidaliWe are facing in-tree failure for almost all the jobs, I tried to look but no luck.14:52
edmondsw#link: http://ci-watch.tintri.com/project?project=nova14:52
edmondswyeah, I was just going to ask about that14:53
edmondswmujahidali does it look like the same issue that efried dug into last week?14:53
edmondswefried do you know if that fix merged?14:53
mujahidalinot sure.14:54
efriedhttps://review.openstack.org/#/c/598365/ not yet merged.14:54
efriedI shoulda freakin +W'd it before Sylvain got hold of it.14:54
mujahidaliAll the in-tree failing Jobs are failing for same "39" test cases.14:55
edmondswefried does he not realize this is causing CI runs to fail?14:55
edmondswI'd think there would be a little more urgency to merge and then cleanup in a followup in that case14:55
efriedI would think so too. I hadn't gotten back around to it yet today, but I'll catch up quick and suggest that.14:56
efriedIt's not blocking libvirt CI, so they don't give a shit.14:56
edmondswmujahidali what else?14:57
mujahidalinodepool latest version is 3.2.0 https://zuul-ci.org/docs/nodepool/releasenotes.html14:57
mujahidalion etherpad why we want it to upgrade from 0.3.0 to 0.5.0 ??14:57
edmondswmujahidali I think we want to upgrade to *at least* 0.5.015:00
edmondswso 3.2.0 would be fine15:00
edmondswand I'd rather we use the latest we can15:00
efriedwhat could possibly go wrong?15:01
mujahidaliI wanted to try the upgrade of nodepool and installation of zookeper along with diskimage-builder on stage environment. esberglu: can I directly do a pip install upgrade nodepool ??15:01
edmondswI'll let you guys work that out after the meeting15:02
edmondswany other status?15:02
efriedbtw, I unfortunately think holding up https://review.openstack.org/#/c/598365/ to get the conf helps modified is legit because this is going to be backported, so we want it in one patch.15:02
edmondswefried ack15:03
efriedI'll see if I can light a fire under Matt to do that.15:03
efriedI would do it, but want to retain my +2 power.15:03
mujahidaliefried: there are some dependancies with jenkins version, if we are upgrading one then need to upgrade all the other dependant.15:03
edmondswmujahidali I guess we could just add it to our patching file, right?15:03
edmondswmujahidali patch it in for now, so we can see if anything else pops up to cause issues15:04
mujahidaliI think yes.15:04
edmondsw#topic Open Discussion15:06
*** openstack changes topic to "Open Discussion (Meeting topic: PowerVM Driver Meeting)"15:06
edmondswlooks like I will not be attending the PTG next week15:06
edmondswand last I heard they were still trying to get approval for gman-tx and efried15:07
efriedI'm approved and booked15:07
efriedI wanted to bring up that other CI topic15:08
edmondswgo ahead15:08
edmondswoh, yeah, I meant to do that15:08
efriedYou want to take it?15:08
edmondswI added some notes to the CI todo etherpad about it15:08
efriedthought we should bring mujahidali up to speed here15:09
efriedin case he can take action15:09
edmondswmujahidali basically, efried wrote a patch where we make our virt driver not work15:09
edmondswso then we can look at what tests pass there and know that those tests must not really be things we need to test in our CI15:09
edmondswand it turns out there are >700 tests that passed15:10
edmondswI think ideally we work with nova and tempest to develop a solution that will allow us to say "test virt driver" rather than try to skip every individual thing that doesn't touch the virt driver15:10
edmondswso like I said, I threw that on the TODO etherpad15:11
edmondswefried anything to add?15:11
edmondswmujahidali make sense?15:11
efriedI wanted to say15:11
efriedthat if mujahidali has time in the near future, it would be neat to try to assemble that list of 707 tests in its own skip section and/or separate blacklist or whatever15:12
efriedand set us up to somehow run with that blacklist (i.e. the smaller subset of tests) like 90% of the time or something.15:12
mujahidaliefried: you are asking me to add the failed test cases to blacklist ??15:13
efriedNot the failed tests.15:13
efriedThe 707 passing ones.15:13
mujahidaliwhy the passing one ??15:13
efriedI would like to see it done in such a way that we can toggle it on and off, preferably at run time, preferably automatically.15:13
efriedBecause those are the tests that aren't touching our code, so we don't really care about them.15:13
efriedThere's value to running them every now and then, in case somebody somewhere else makes a change that happens to break specifically when it runs on Power.15:14
efriedBut if we could limit the full run to only a small fraction of the time, it would take a big load off our CI systems, and reduce our run time drastically.15:14
efriedSo like, use a random number and if it's less than a certain threshold, run the full test; otherwise run the subset. Kind of thing.15:15
efriedI would guess that logic would happen in the shell script somewhere. Let me know if you need help coding it up.15:16
mujahidaliSo If we add the passing 700 test to balcklist then they will never run, so how will we gonna run the full test ??15:17
efriedswhat I'm saying, we would want to maintain that blacklist as a separate file; then whenever we don't hit that random threshold, we cat it together with the real blacklist.15:18
edmondswtoggle between different blacklist files15:18
efriedor that15:18
edmondswor what efried said15:18
mujahidalifull_blacklist file ??15:19
mujahidaliI will do that ASAP once the CI is stable.15:19
*** openstackgerrit has quit IRC15:20
*** openstack changes topic to "This channel is for PowerVM-related development and discussion. For general OpenStack support, please use #openstack."15:21
openstackMeeting ended Tue Sep  4 15:21:32 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:21
openstackMinutes:        http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2018/powervm_driver_meeting.2018-09-04-14.01.html15:21
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2018/powervm_driver_meeting.2018-09-04-14.01.txt15:21
openstackLog:            http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2018/powervm_driver_meeting.2018-09-04-14.01.log.html15:21
efriededmondsw, mujahidali: https://review.openstack.org/#/c/598365/ is merging.15:36
efriedbullied bauzas into it :)15:36
edmondswefried :)15:36
*** mdrabe has quit IRC16:10
*** mujahidali has quit IRC16:32

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!