15:01:15 <krtaylor> #startmeeting third-party
15:01:16 <openstack> Meeting started Wed Apr  1 15:01:15 2015 UTC and is due to finish in 60 minutes.  The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:20 <openstack> The meeting name has been set to 'third_party'
15:01:41 <patrickeast> hi
15:01:43 <krtaylor> who's here for third party ci working group meeting?
15:01:49 <krtaylor> hi patrickeast
15:01:50 * ctlaugh is here
15:01:50 <sweston> o/
15:01:56 <krtaylor> hi sweston
15:01:59 <krtaylor> hi ctlaugh
15:02:03 <zz_ja|> arwe here
15:02:14 <krtaylor> hi zz_ja|
15:02:16 <mmedvede> hi
15:02:23 <krtaylor> hey mmedvede
15:02:55 <krtaylor> how is everyone, hopefully there were not too many pranks played on you today
15:03:14 <ctlaugh> not yet
15:03:31 <krtaylor> hehheh
15:03:37 <krtaylor> here is the agenda
15:03:42 <krtaylor> #link https://wiki.openstack.org/wiki/Meetings/ThirdParty#4.2F1.2F15_1500_UTC
15:04:15 <krtaylor> #topic Topics for discussion at Liberty summit in Vancouver
15:04:35 <krtaylor> I created an etherpad for everyone to put their topic ideas
15:05:05 <krtaylor> if you put an idea there, please put your nick next to it
15:05:27 <krtaylor> like last time, we'll prioritize and select a few things to discuss
15:05:57 <zz_ja|> krtaylor, what does this mean: With functional testing moving back to the projects, what does this mean for third party CI?
15:05:59 <krtaylor> #action krtaylor will see about getting a design session for TPCIWG discussion
15:06:28 <zz_ja|> might be that I've seen this using different keywords and just don't recognize it
15:07:03 <krtaylor> zz_ja|, thats a question about what CI will be required to test since test suite is shrinking
15:07:13 <asselin> hi
15:07:22 <krtaylor> it may be nothing, just wanted to see if we need to work with the projects to figure that out
15:07:29 <krtaylor> hi asselin
15:07:58 <zz_ja|> you mean b/c "tempest" tests are moving to "nova-tempest" and such?
15:08:13 <krtaylor> that is certainly a lower priority  topic, not sure we need to discuss that at summit, but put it there to keep it
15:08:42 <zz_ja|> I took those threads to mean just a change of repo, rather than what tests would have to run
15:09:17 <krtaylor> yes, functional tests are being removed back to the projects, not in tests run per patch, or required to test per patch anyway
15:09:43 <krtaylor> it may be that we just run a weekly periodic, but may not even be required at all
15:10:26 <krtaylor> #link https://etherpad.openstack.org/p/liberty-third-party-ci-working-group
15:10:33 <krtaylor> for completeness
15:10:48 <zz_ja|> I guess I assumed that nova would say "every patch must run nova-tempest" and so on
15:11:14 <krtaylor> I'm not sure we can assume that is a requirement
15:12:01 <krtaylor> anyway, we can discuss that with the projects, it may be discussed in a QA session, so we may not need to worry about it in a TPCIWG session
15:12:22 <krtaylor> any questions about adding session topics to the etherpad?
15:12:46 <krtaylor> #topic In-tree 3rd party ci solution (downstream-puppet)
15:13:05 <krtaylor> lots of patches, this is moving nicely!
15:13:21 <krtaylor> #link https://review.openstack.org/#/q/topic:downstream-puppet,n,z
15:13:32 <asselin> yes, I'd like to get some reviews on the log server patches
15:13:51 <krtaylor> sure asselin, link?
15:14:17 <asselin> there were quite a few ppl interested to participate in the effort: Please take a look as I'd like this to be the 'model' to help do the others
15:14:41 <asselin> #link https://review.openstack.org/#/q/topic:downstream-puppet+owner:%22Ramy+Asselin+%253Cramy.asselin%2540hp.com%253E%22+status:open,n,z
15:14:45 <ctlaugh> could I get a brief description of what this is?
15:14:58 <sweston> asselin: +1
15:14:59 <asselin> ctlaugh, the log server?
15:15:16 <krtaylor> or the effort overall?
15:15:19 <ctlaugh> sorry, downstream-puppet
15:16:09 <asselin> #link http://specs.openstack.org/openstack-infra/infra-specs/specs/openstackci.html
15:16:12 <asselin> ctlaugh, ^^
15:16:45 <ctlaugh> thank you
15:17:08 <asselin> idea is to get scripts reusable by all. eventually get a 'common' ci solution
15:17:15 <ctlaugh> I love that idea
15:17:45 <asselin> ctlaugh, great! :)
15:18:08 <krtaylor> ++
15:18:37 <krtaylor> it will be great going forward, unify the work
15:18:50 <krtaylor> asselin, thanks again for all your leadership in this effort
15:18:52 <ctlaugh> One of my biggest concerns setting our system up (using an external repo such as asselin's) is that changes upstream do not comprehend any of the downstream systems that are using it.
15:19:08 <zz_ja|> we will be jumping on this the instant we have our "hw" procured
15:19:20 <krtaylor> yes, exactly, or how they will potentially be impacted
15:20:03 <krtaylor> ok, any other questions on the downstream-puppet work?
15:20:28 <krtaylor> #topic Repo for third party tools
15:20:44 <asselin> yes. please review, download patches and try them out. See if they would work in 'your' environment.
15:21:31 <krtaylor> internally, I was trying to get permission to add our tools to the index via our personal github accounts, but that is going to be too painful
15:22:17 <krtaylor> so I am going to start up the process for creating a stackforge repo for us to share configs, tools, things we need to share for TPCIWG, to help each other
15:22:42 <krtaylor> I was hoping to have the bandwidth by now, silly day job...
15:23:18 <krtaylor> but we do have the TPCIWG wiki index, if anyone wants to add to it for now
15:23:37 <krtaylor> #link https://wiki.openstack.org/wiki/ThirdPartyCIWorkingGroup#Third_Party_CI_System_Tools_Index
15:24:07 <krtaylor> ok, probably not much to discuss there, any questions?
15:24:45 <krtaylor> #topic monitoring dashboard
15:25:04 <krtaylor> great progress here, thanks for the re-write sweston!
15:25:26 <asselin> +1
15:25:36 <krtaylor> #link https://review.openstack.org/#/c/135170/
15:25:37 <sweston> you bet ... just need to respond to jhesketh's comments, and this might merge soon ;-)
15:25:48 <patrickeast> definitely good progress, i really like the updates
15:25:57 <sweston> patrickeast: thanks
15:26:12 <krtaylor> yes, but even with thoses, he supported it as is
15:26:24 <sweston> krtaylor, asselin: thanks for your help as well
15:26:51 <krtaylor> no porblem, thanks sweston for putting it on a diet and getting it rolling again
15:27:03 <sweston> krtaylor: you're welcome
15:28:24 <krtaylor> ok, so any questions for the monitoring dashboard effort?
15:29:10 <krtaylor> alright, on to my favorite part of the meeting
15:29:17 <krtaylor> #topic Highlighting Third-Party CI Service
15:29:51 <krtaylor> today we have ctlaugh presenting on Linaro CI system for ARM testing
15:30:20 <krtaylor> ctlaugh, the floor is yours
15:30:24 <ctlaugh> thank you
15:30:28 <ctlaugh> I am working with Linaro to setup a 3rd-party CI system to test Openstack on arm64.  Our intent is to ultimately just run as much of Tempest as we reasonably can, given the compute resources that we have available to scale with.  Currently, I am triggering off of changes to openstack/nova and openstack-dev/devstack.
15:30:46 <ctlaugh> The hardware we are using is an HP Moonshot enclosure filled with a combination of HP Proliant m300 (amd64) nodes (about 15) and HP Proliant m400 (arm64) nodes (about 20).
15:30:55 <ctlaugh> I am using asselin’s os-ext-testing fork on github to deploy the setup.
15:31:10 <ctlaugh> The Openstack cloud we are using is deployed using Juju mainly on the amd64 nodes, with the compute nodes deployed on the arm64 nodes.
15:31:12 <ctlaugh> The various Openstack CI components are also all deployed on amd64 nodes.
15:31:35 <ctlaugh> I started out by first setting up the entire system all on amd64 nodes in order to make sure  everything was setup and working properly.
15:31:53 <ctlaugh> After a lot of help from asselin, a few patches to his fork, and a bit of tweaking, everything seemed to be running ok for a start.  The main problem I kept experiencing was Jenkins failures:  about 30-40% of the time, Jenkins would throw an exception and fail the test saying it lost connectivity with a slave.
15:31:54 * sweston does not want to see ctlaugh's power bill
15:32:01 <ctlaugh> he he
15:32:02 <krtaylor> ctlaugh, are you initially focused on running against nova patches
15:32:12 <ctlaugh> Yes - for now
15:32:53 <ctlaugh> We aren't entirely sure what the best thing will be to trigger off of long-term, since we aren't necessarily focused on testing a specific project.
15:33:26 <ctlaugh> Nova is probably the one single project we are most interested in (for now), but we also want to be able to demonstrate that everything runs on arm64.
15:33:40 <sweston> ctlaugh: are you using single use slaves?  this was an issue I had when I was starting out with CI .. cruft from previous runs was causing instability
15:33:46 <krtaylor> ctlaugh, we are in a similar situation for PowerKVM, since we are re-using the existing libvirt driver
15:34:01 <ctlaugh> sweston: yes, using single-use slaves
15:35:11 <ctlaugh> krtaylor: right now, we are focused on KVM + libvirt, but in time, we might also be interested in Xen on arm64 as well.  There is a concern that it will become unsupported by Openstack since no one is testing it.
15:35:27 <asselin> ctlaugh, currently i'm experimenting with disabling c-states in the bios & power management. seems to be helping so far.
15:35:48 <ctlaugh> asselin: helping with what?
15:35:55 <krtaylor> ctlaugh, yes, we have a different need to show openstack works on our platform instead of just testing a single driver as most third party ci is focused on
15:36:03 <asselin> ctlaugh, running jenkins slaves
15:36:12 * asselin looks up link
15:36:23 <ctlaugh> asselin: ah -- interesting
15:36:48 <asselin> #link http://support.citrix.com/article/CTX127395
15:37:13 <asselin> got my ideas from this ^^, although we have different HW, etc.
15:37:49 <ctlaugh> Right now, I am now in the process of replacing the amd64 compute nodes I have been testing with far far, and switching to arm64 compute nodes.  I am currently working on the nodepool scripts to get them to deploy on arm64 correctly, and that is probably what will require the most changes to get right.
15:38:05 <ctlaugh> I have already submitted a change to openstack-infra/puppet-jenkins in order to correct the JDK deployment to not be amd64-specific.  It is currently awaiting review.
15:38:34 <ctlaugh> I have also identified (I think) a few changes that need to be made to openstack-infra/system-config and openstack-infra/project-config that are also needed (related to how puppet gets installed), though I am still doing some testing.
15:38:45 <ctlaugh> I don’t know yet if that’s all that is going to be required, or if I will run into other problems once I try to run on arm64.
15:38:45 <krtaylor> ctlaugh, link? I'll review
15:38:46 <ctlaugh> A big concern is with Jenkins - whether the problems I was seeing there will be about the same as before, or be worse.
15:39:13 * ctlaugh is looking...
15:39:36 <ctlaugh> #link https://review.openstack.org/#/c/168222/
15:40:06 <krtaylor> perfect, tnaks
15:40:07 <ctlaugh> So that's about it...
15:40:10 <krtaylor> thanks too
15:40:45 <ctlaugh> Any other questions (or suggestions)?
15:41:12 <krtaylor> ctlaugh, so any ideas on the 30-40% failures, lost connection problems
15:42:23 <ctlaugh> krtaylor: not yet.  At one point, I thought it might have been related to how much load each of the nodes was under.  It _seemed_ to get worse the more VMs I had running on a single nova-compute node.
15:42:25 <asselin> ctlaugh, in my case I was looking at the nova console.log files for the jenkins slave which led my to that article about adjusting power states
15:42:51 <ctlaugh> The amd64 nodes I have are not that great -- the processors are actually atom-based.
15:43:06 <krtaylor> ctlaugh, are they the same failures each time, or transient?
15:43:39 <ctlaugh> When it fails, it's the same each time.  I'll post a link to the failure as soon as I find an example.
15:44:45 <ctlaugh> My hope is that when I start using the more capable arm64 nodes, I'll have less failures.
15:45:27 <ctlaugh> I don't have a convenient way to grab the error text at the moment.
15:45:45 <krtaylor> ctlaugh, we had increase in stability and test execution (reducing skips) by reducing IO blocks
15:46:41 <krtaylor> mmedvede on my team has been driving that effort for us, may be a good contact for you
15:46:53 <ctlaugh> great - thank you
15:47:49 <mmedvede> ctlaugh: I can look at errors/logs if you'd like
15:48:02 <krtaylor> ctlaugh, any tools you use in your environment that really help you?
15:48:39 <krtaylor> ctlaugh, anything you came up with that may help others?
15:48:43 <ctlaugh> krtaylor: not really anything extra.  I use MAAS and Juju to manage deployment of all of the hardware, mainly because that's the easiest way to do it.
15:49:02 <ctlaugh> Otherwise, so far it's just the bits deployed by asselin's scripts.
15:49:46 <ctlaugh> If Jenkins doesn't work out, that might change and I may have a replacement to share with others :)
15:50:44 <krtaylor> ctlaugh, ok, thanks, thats good to know, feel free to add anything you come up with to the tools index
15:51:00 <ctlaugh> ok, I will
15:51:19 <krtaylor> asselin, one more successful deployment from your repo :)
15:51:34 <asselin> great! :)
15:52:06 <asselin> ctlaugh, thanks for the pr's
15:52:25 <krtaylor> ok, so any other questions for ctlaugh about his ARM64 test environment?
15:53:55 <krtaylor> ctlaugh, thanks for coming and talking about your environment, it's really good to get to know more about other's systems
15:53:56 <asselin> ctlaugh, thanks for sharing
15:54:09 <ctlaugh> you're welcome -- it was a pleasure
15:54:28 <krtaylor> ok, then
15:54:31 <krtaylor> #topic Open Discussion
15:54:41 <krtaylor> anything anyone wants to discuss?
15:55:24 * krtaylor notes it is rare that we have time to get to Open Discussion
15:57:20 <krtaylor> ok, well if nothing else, then I'll wrap this up
15:57:35 <krtaylor> don't forget to add your summit topics to the etherpad
15:58:07 <krtaylor> thanks everyone, thanks ctlaugh, another great meeting!
15:58:19 <sweston> thanks, krtaylor
15:58:34 <asselin> thanks
15:58:44 <ctlaugh> thank you, everyone
15:59:01 <krtaylor> #endmeeting