15:02:04 <lennyb> #startmeeting third-party
15:02:05 <openstack> Meeting started Mon Mar 20 15:02:04 2017 UTC and is due to finish in 60 minutes.  The chair is lennyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:08 <openstack> The meeting name has been set to 'third_party'
15:02:13 <lennyb> Hello
15:03:32 <mmedvede> o/
15:03:41 <lennyb> mmedvede, hi,
15:03:59 <lennyb> do you run fedora 24 in your CI?
15:04:05 <asselin> o/
15:04:41 <lennyb> hello asselin
15:04:50 <mmedvede> lennyb: no, we only run Ubuntu at the moment.
15:04:53 <pots> o/
15:04:54 <lennyb> I am getting strange error http://paste.openstack.org/show/603440/
15:04:58 <lennyb> hello pots
15:05:15 <lennyb> mmedvede what verion of ubuntu?
15:05:43 <mmedvede> lennyb: our dsvm tests use Xenial
15:06:55 <lennyb> mmedvede, I see.
15:07:37 <lennyb> asselin, btw, your suggestion regarding zuul configuration for using one time node is not working for us, since we use multijob plugin.
15:08:02 <lennyb> post, asselin, mmedvede - anything you would like to discuss today?
15:08:04 <asselin> what is the multijob plugin? Is that jenkins?
15:08:37 <lennyb> asselin, yes jenkins multijob plugin https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin
15:08:55 <lennyb> one job to run a list of other jhobs
15:09:27 <lennyb> it is also supported by JJB
15:10:40 * asselin_ having network issues
15:11:09 <lennyb> I also face something similiar to #link  https://bugs.launchpad.net/zuul/+bug/1270029  . with zuul 2.5.1. I am trying to check the proposed workaround
15:11:09 <openstack> Launchpad bug 1270029 in Zuul "zuul doesn't connect to gearman server" [Undecided,New]
15:11:32 <asselin_> lennyb: I see...I don't think zuul and multijob plugin will play nice together.
15:12:18 <mptacekx> lennyb: I think I saw this as well
15:12:31 <lennyb> asselin_, is there a way to make jobs dependable in zuul?
15:13:01 <lennyb> like Build JOB -> Install JOB -> Test Job
15:13:29 <asselin_> lennyb: I don't think so...maybe zuul v3 has that.
15:13:51 <asselin_> lennyb: otherwise zuul does have post-build jobs ability, but not quite like you want
15:13:57 <mmedvede> zuul v3 (maybe even 2.5) should have that
15:13:59 <lennyb> mptacekx, what are you doing to overcome it? mmedvede uses zuul restart one a week.
15:14:38 <lennyb> mmedvede, I will check 2.5, thanks
15:15:10 <asselin_> lennyb: but you should be able to structure the individual parts of the job in JJB, then combine them into one high level job...but you do lose the parallel part that I suppose you're looking for.
15:15:23 <mptacekx> lennyb: I saw it just once in couple of months with zuul 2.5.1, had to restart zuul to wake him up. Not a real cure
15:16:53 <lennyb> asselin_, I use multijob plugin for this.
15:17:41 <lennyb> I can also run some jobs in parallel, there.
15:18:03 <lennyb> Is there anything else you would like to share/ask/discuss today?
15:18:09 <asselin_> lennyb: so remind me what the issue is again? Using multijob plugin with nodepool?
15:18:59 <lennyb> asselin_, I must add quite-period of 1min to a job, since it take s time for a nodepool to remove a slave from the jenkins.
15:19:41 <lennyb> and since all the job takes about 10min, 3 minutes of quite-period is 30% of the run time :)
15:20:13 <lennyb> but it's not that critical. so let's move to another things, if there are
15:20:15 <asselin_> lennyb: ok I see...yeah, might need to use that workaround since the tooling isn't designed to work as you want.
15:20:32 <pots> lennyb: i tried the quiet-period change but it had no effect--is there any way to fix this in zuul?
15:20:34 <asselin_> lennyb: ok. btw in zuulv3 it will switch to ansible and you'll have better support for the use case
15:21:09 <lennyb> pots, did you try asselin_ 's suggestion from the last week>
15:21:32 <pots> i don't recall what it was?
15:22:38 * lennyb looking
15:22:58 <lennyb> #link  http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2017-03-13.log.html
15:24:24 <asselin_> this here? http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2017-03-13.log.html#t2017-03-13T15:04:54
15:24:24 <pots> oops, i did not see that.
15:24:50 <lennyb> anything else on this issue?
15:25:42 <lennyb> any other issues?
15:25:58 <pots> i've already got the parameter-function: single_use_node entry
15:26:16 <lennyb> pots, what is not working in your setup?
15:26:48 <asselin_> pots: are you using it in zuul's layout.yaml?
15:27:25 <pots> i'm using openstackci-puppet on top of devstack, with two jobs defined (one for fc, one iscsi).  whenever a job completes, it looks like zuul starts the next job using the old nodepool vm, which fails.  then nodepool comes around and kills the old vm a minute later.
15:27:44 <pots> so basically every other job fails
15:27:53 <asselin_> pots: can you share your zuul layout.yaml file?
15:28:16 <lennyb> pots, do you have jenkins?
15:28:20 <asselin_> does it look like what I shared before? http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2017-03-13.log.html#t2017-03-13T15:05:19
15:28:32 <asselin_> - name: ^dsvm-tempest.*$     parameter-function: single_use_node
15:29:33 <pots> yes, it looks like that.
15:29:41 <pots> sorry, can't seem to cut & paste today
15:29:49 <asselin_> btw I need to leave in 2 mins. pots if you still have issues we can discuss in openstack-infra. Others there might be able to help.
15:30:04 <asselin_> discuss later*
15:30:04 <lennyb> thanks, asselin_
15:30:10 <pots> ok, i will see you there.  thanks!
15:30:28 * asselin_ leaves now
15:30:37 <lennyb> pots, do you use jenkins?
15:30:59 <pots> yes, i'm following the recipe for openstackci-puppet.  same issue with jenkins 2.x and 1.651 and 1.656
15:32:00 <lennyb> pots, do you see 'quite period' in jenkins job?
15:32:32 <pots> yes
15:32:35 <clarkb> newer jenkins filters out injected vars by default. Zuul coordinates single use VM action with jenkins via a job parameter iirc. You may need to whitelist that var
15:33:21 <lennyb> pots, so quite period should work anyway, maybe increase the delay.
15:33:22 <pots> i got the impression that quiet-period only affected the normal jenkins repo triggers and would not necessarily impact gearman
15:33:49 <pots> but i will try a much larger value and see if it has any effect.
15:34:29 <lennyb> pots, in our case Jenkins waits before starting job, in the meantime nodepool have enough  time to delete slave from the jenkins
15:36:42 <pots> i was wondering if there might be some way to modify zuul/gearman behavior.  e.g. doesn't gearman choose a specific executor before giving the job to jenkins?
15:37:16 <lennyb> pots, try asking in infra channel
15:38:30 <pots> will do.  one last question, i was wondering what the conventional wisdom was re: setting up a new cloud for the CI to run on, something closer to a devstack that can survive a reboot?
15:39:03 <lennyb> pots, we have CI on RDO based cloud
15:39:13 <clarkb> pots: lennyb ^ seem my earlier comment
15:39:49 <pots> clarkb thanks, i resolved that issue earlier.
15:39:50 <lennyb> pots, but we are considering to check Fuel as cloud provider for CI
15:39:59 <mmedvede> pots: we have a full blown cloud that we maintain (not based on devstack)
15:40:16 <clarkb> pots: did you resolve it by whitelisting vars? and if so did you whitelist the var that tells gearman to not reuse a host?
15:40:37 <clarkb> its something like OFFLINE_NODE or something
15:41:03 <mmedvede> clarkb: I believe the way most third-party CI's do is to disable the security feature of jenkins, so no need to whitelist variables. But pots' setup might be different
15:41:13 <pots> clarkb: I thought I whitelisted everything
15:42:31 <pots> clarkb: asselin_ showed me an additional parameter for jenkins that works.
15:44:50 <pots> i don't see any OFFLINE_XXX parameter in the jenkins jobs
15:46:56 <lennyb> pots, asselin_ proposed zuul based param(I dont think it will be reflected in Jenkins). I proposed quite-period that can be seen in jenkins.
15:47:19 <clarkb> params['OFFLINE_NODE_WHEN_COMPLETE'] = '1' is what single_user_node sets in the gearman communication from zuul to jenkins
15:47:47 <pots> so that should appear in the job parameters in the jenkins UI?
15:49:18 <clarkb> it may, jenkins is weird about what params it shows iirc
15:51:11 <pots> it seems like a race condition, i think when i looked at the logs you could see the wrong thing happening in the same second.
15:51:36 <pots> my systems are very slow, so that just makes everything more fun.
15:52:26 <pots> lennyb so would you recommend chucking xenial/devstack for RHEL7/packstack for an all-in-one to run the CI on?
15:52:36 <clarkb> the actual implementation of offline node when complete should toggle the offline bit when job is running then jenkins puts it into effect when the job is completed (so there could be a race there somewher)
15:53:58 <pots> any way to debug the gearman stuff?
15:54:02 <lennyb> pots, I am not familiar with xenial,packstack. we use rdo #link https://www.rdoproject.org   . It's RedHat oriented, but not really packstack
15:55:09 <pots> i'm just looking for an All-In-One solution that works out of the box
15:56:29 <lennyb> pots, rdo works out of the box. to be honest, we had to add networks manually, but it was quite straight forward. I had a lot of positive things regarding Fuel, but did not use it yet
15:56:50 <pots> i will look into that, thanks.
15:57:19 <lennyb> any other shares/questions/proposals ? before our time ends?
15:57:24 <pots> i tried openstack-ansible but it did not pass it's own tests
15:57:51 <pots> lennyb: not for me, thanks all
15:58:38 <lennyb> Great, mmedvede, clarkb, pots, mptacekx, asselin_ .. thanks. see you next week
15:58:41 <clarkb> pots: yes one thing you could do to help debug it is while a job is running use the nodepool hold command to hold that node (it won't be deleted) then check that the node is marked offline properly when the job completes
15:59:34 <lennyb> clarkb, can I 'release' node without deleting it after hold?
16:00:33 <clarkb> release it from what perspective? you can delete it in jenkins so jenkins will stop doing things to it
16:00:43 <clarkb> but if you delete it in nodepool it iwll get deleted from the cloud
16:01:47 <lennyb> clarkb, ignore my question. thanks.
16:01:54 <lennyb> #endmeeting