15:01:15 #startmeeting third-party 15:01:16 Meeting started Wed Apr 1 15:01:15 2015 UTC and is due to finish in 60 minutes. The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:20 The meeting name has been set to 'third_party' 15:01:41 hi 15:01:43 who's here for third party ci working group meeting? 15:01:49 hi patrickeast 15:01:50 * ctlaugh is here 15:01:50 o/ 15:01:56 hi sweston 15:01:59 hi ctlaugh 15:02:03 arwe here 15:02:14 hi zz_ja| 15:02:16 hi 15:02:23 hey mmedvede 15:02:55 how is everyone, hopefully there were not too many pranks played on you today 15:03:14 not yet 15:03:31 hehheh 15:03:37 here is the agenda 15:03:42 #link https://wiki.openstack.org/wiki/Meetings/ThirdParty#4.2F1.2F15_1500_UTC 15:04:15 #topic Topics for discussion at Liberty summit in Vancouver 15:04:35 I created an etherpad for everyone to put their topic ideas 15:05:05 if you put an idea there, please put your nick next to it 15:05:27 like last time, we'll prioritize and select a few things to discuss 15:05:57 krtaylor, what does this mean: With functional testing moving back to the projects, what does this mean for third party CI? 15:05:59 #action krtaylor will see about getting a design session for TPCIWG discussion 15:06:28 might be that I've seen this using different keywords and just don't recognize it 15:07:03 zz_ja|, thats a question about what CI will be required to test since test suite is shrinking 15:07:13 hi 15:07:22 it may be nothing, just wanted to see if we need to work with the projects to figure that out 15:07:29 hi asselin 15:07:58 you mean b/c "tempest" tests are moving to "nova-tempest" and such? 15:08:13 that is certainly a lower priority topic, not sure we need to discuss that at summit, but put it there to keep it 15:08:42 I took those threads to mean just a change of repo, rather than what tests would have to run 15:09:17 yes, functional tests are being removed back to the projects, not in tests run per patch, or required to test per patch anyway 15:09:43 it may be that we just run a weekly periodic, but may not even be required at all 15:10:26 #link https://etherpad.openstack.org/p/liberty-third-party-ci-working-group 15:10:33 for completeness 15:10:48 I guess I assumed that nova would say "every patch must run nova-tempest" and so on 15:11:14 I'm not sure we can assume that is a requirement 15:12:01 anyway, we can discuss that with the projects, it may be discussed in a QA session, so we may not need to worry about it in a TPCIWG session 15:12:22 any questions about adding session topics to the etherpad? 15:12:46 #topic In-tree 3rd party ci solution (downstream-puppet) 15:13:05 lots of patches, this is moving nicely! 15:13:21 #link https://review.openstack.org/#/q/topic:downstream-puppet,n,z 15:13:32 yes, I'd like to get some reviews on the log server patches 15:13:51 sure asselin, link? 15:14:17 there were quite a few ppl interested to participate in the effort: Please take a look as I'd like this to be the 'model' to help do the others 15:14:41 #link https://review.openstack.org/#/q/topic:downstream-puppet+owner:%22Ramy+Asselin+%253Cramy.asselin%2540hp.com%253E%22+status:open,n,z 15:14:45 could I get a brief description of what this is? 15:14:58 asselin: +1 15:14:59 ctlaugh, the log server? 15:15:16 or the effort overall? 15:15:19 sorry, downstream-puppet 15:16:09 #link http://specs.openstack.org/openstack-infra/infra-specs/specs/openstackci.html 15:16:12 ctlaugh, ^^ 15:16:45 thank you 15:17:08 idea is to get scripts reusable by all. eventually get a 'common' ci solution 15:17:15 I love that idea 15:17:45 ctlaugh, great! :) 15:18:08 ++ 15:18:37 it will be great going forward, unify the work 15:18:50 asselin, thanks again for all your leadership in this effort 15:18:52 One of my biggest concerns setting our system up (using an external repo such as asselin's) is that changes upstream do not comprehend any of the downstream systems that are using it. 15:19:08 we will be jumping on this the instant we have our "hw" procured 15:19:20 yes, exactly, or how they will potentially be impacted 15:20:03 ok, any other questions on the downstream-puppet work? 15:20:28 #topic Repo for third party tools 15:20:44 yes. please review, download patches and try them out. See if they would work in 'your' environment. 15:21:31 internally, I was trying to get permission to add our tools to the index via our personal github accounts, but that is going to be too painful 15:22:17 so I am going to start up the process for creating a stackforge repo for us to share configs, tools, things we need to share for TPCIWG, to help each other 15:22:42 I was hoping to have the bandwidth by now, silly day job... 15:23:18 but we do have the TPCIWG wiki index, if anyone wants to add to it for now 15:23:37 #link https://wiki.openstack.org/wiki/ThirdPartyCIWorkingGroup#Third_Party_CI_System_Tools_Index 15:24:07 ok, probably not much to discuss there, any questions? 15:24:45 #topic monitoring dashboard 15:25:04 great progress here, thanks for the re-write sweston! 15:25:26 +1 15:25:36 #link https://review.openstack.org/#/c/135170/ 15:25:37 you bet ... just need to respond to jhesketh's comments, and this might merge soon ;-) 15:25:48 definitely good progress, i really like the updates 15:25:57 patrickeast: thanks 15:26:12 yes, but even with thoses, he supported it as is 15:26:24 krtaylor, asselin: thanks for your help as well 15:26:51 no porblem, thanks sweston for putting it on a diet and getting it rolling again 15:27:03 krtaylor: you're welcome 15:28:24 ok, so any questions for the monitoring dashboard effort? 15:29:10 alright, on to my favorite part of the meeting 15:29:17 #topic Highlighting Third-Party CI Service 15:29:51 today we have ctlaugh presenting on Linaro CI system for ARM testing 15:30:20 ctlaugh, the floor is yours 15:30:24 thank you 15:30:28 I am working with Linaro to setup a 3rd-party CI system to test Openstack on arm64. Our intent is to ultimately just run as much of Tempest as we reasonably can, given the compute resources that we have available to scale with. Currently, I am triggering off of changes to openstack/nova and openstack-dev/devstack. 15:30:46 The hardware we are using is an HP Moonshot enclosure filled with a combination of HP Proliant m300 (amd64) nodes (about 15) and HP Proliant m400 (arm64) nodes (about 20). 15:30:55 I am using asselin’s os-ext-testing fork on github to deploy the setup. 15:31:10 The Openstack cloud we are using is deployed using Juju mainly on the amd64 nodes, with the compute nodes deployed on the arm64 nodes. 15:31:12 The various Openstack CI components are also all deployed on amd64 nodes. 15:31:35 I started out by first setting up the entire system all on amd64 nodes in order to make sure everything was setup and working properly. 15:31:53 After a lot of help from asselin, a few patches to his fork, and a bit of tweaking, everything seemed to be running ok for a start. The main problem I kept experiencing was Jenkins failures: about 30-40% of the time, Jenkins would throw an exception and fail the test saying it lost connectivity with a slave. 15:31:54 * sweston does not want to see ctlaugh's power bill 15:32:01 he he 15:32:02 ctlaugh, are you initially focused on running against nova patches 15:32:12 Yes - for now 15:32:53 We aren't entirely sure what the best thing will be to trigger off of long-term, since we aren't necessarily focused on testing a specific project. 15:33:26 Nova is probably the one single project we are most interested in (for now), but we also want to be able to demonstrate that everything runs on arm64. 15:33:40 ctlaugh: are you using single use slaves? this was an issue I had when I was starting out with CI .. cruft from previous runs was causing instability 15:33:46 ctlaugh, we are in a similar situation for PowerKVM, since we are re-using the existing libvirt driver 15:34:01 sweston: yes, using single-use slaves 15:35:11 krtaylor: right now, we are focused on KVM + libvirt, but in time, we might also be interested in Xen on arm64 as well. There is a concern that it will become unsupported by Openstack since no one is testing it. 15:35:27 ctlaugh, currently i'm experimenting with disabling c-states in the bios & power management. seems to be helping so far. 15:35:48 asselin: helping with what? 15:35:55 ctlaugh, yes, we have a different need to show openstack works on our platform instead of just testing a single driver as most third party ci is focused on 15:36:03 ctlaugh, running jenkins slaves 15:36:12 * asselin looks up link 15:36:23 asselin: ah -- interesting 15:36:48 #link http://support.citrix.com/article/CTX127395 15:37:13 got my ideas from this ^^, although we have different HW, etc. 15:37:49 Right now, I am now in the process of replacing the amd64 compute nodes I have been testing with far far, and switching to arm64 compute nodes. I am currently working on the nodepool scripts to get them to deploy on arm64 correctly, and that is probably what will require the most changes to get right. 15:38:05 I have already submitted a change to openstack-infra/puppet-jenkins in order to correct the JDK deployment to not be amd64-specific. It is currently awaiting review. 15:38:34 I have also identified (I think) a few changes that need to be made to openstack-infra/system-config and openstack-infra/project-config that are also needed (related to how puppet gets installed), though I am still doing some testing. 15:38:45 I don’t know yet if that’s all that is going to be required, or if I will run into other problems once I try to run on arm64. 15:38:45 ctlaugh, link? I'll review 15:38:46 A big concern is with Jenkins - whether the problems I was seeing there will be about the same as before, or be worse. 15:39:13 * ctlaugh is looking... 15:39:36 #link https://review.openstack.org/#/c/168222/ 15:40:06 perfect, tnaks 15:40:07 So that's about it... 15:40:10 thanks too 15:40:45 Any other questions (or suggestions)? 15:41:12 ctlaugh, so any ideas on the 30-40% failures, lost connection problems 15:42:23 krtaylor: not yet. At one point, I thought it might have been related to how much load each of the nodes was under. It _seemed_ to get worse the more VMs I had running on a single nova-compute node. 15:42:25 ctlaugh, in my case I was looking at the nova console.log files for the jenkins slave which led my to that article about adjusting power states 15:42:51 The amd64 nodes I have are not that great -- the processors are actually atom-based. 15:43:06 ctlaugh, are they the same failures each time, or transient? 15:43:39 When it fails, it's the same each time. I'll post a link to the failure as soon as I find an example. 15:44:45 My hope is that when I start using the more capable arm64 nodes, I'll have less failures. 15:45:27 I don't have a convenient way to grab the error text at the moment. 15:45:45 ctlaugh, we had increase in stability and test execution (reducing skips) by reducing IO blocks 15:46:41 mmedvede on my team has been driving that effort for us, may be a good contact for you 15:46:53 great - thank you 15:47:49 ctlaugh: I can look at errors/logs if you'd like 15:48:02 ctlaugh, any tools you use in your environment that really help you? 15:48:39 ctlaugh, anything you came up with that may help others? 15:48:43 krtaylor: not really anything extra. I use MAAS and Juju to manage deployment of all of the hardware, mainly because that's the easiest way to do it. 15:49:02 Otherwise, so far it's just the bits deployed by asselin's scripts. 15:49:46 If Jenkins doesn't work out, that might change and I may have a replacement to share with others :) 15:50:44 ctlaugh, ok, thanks, thats good to know, feel free to add anything you come up with to the tools index 15:51:00 ok, I will 15:51:19 asselin, one more successful deployment from your repo :) 15:51:34 great! :) 15:52:06 ctlaugh, thanks for the pr's 15:52:25 ok, so any other questions for ctlaugh about his ARM64 test environment? 15:53:55 ctlaugh, thanks for coming and talking about your environment, it's really good to get to know more about other's systems 15:53:56 ctlaugh, thanks for sharing 15:54:09 you're welcome -- it was a pleasure 15:54:28 ok, then 15:54:31 #topic Open Discussion 15:54:41 anything anyone wants to discuss? 15:55:24 * krtaylor notes it is rare that we have time to get to Open Discussion 15:57:20 ok, well if nothing else, then I'll wrap this up 15:57:35 don't forget to add your summit topics to the etherpad 15:58:07 thanks everyone, thanks ctlaugh, another great meeting! 15:58:19 thanks, krtaylor 15:58:34 thanks 15:58:44 thank you, everyone 15:59:01 #endmeeting