15:00:48 <krtaylor> #startmeeting third-party
15:00:49 <openstack> Meeting started Wed Feb  4 15:00:48 2015 UTC and is due to finish in 60 minutes.  The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:53 <openstack> The meeting name has been set to 'third_party'
15:00:58 <omrim> Hello
15:01:02 <mmedvede> o/
15:01:04 <rfolco> Hi
15:01:09 <krtaylor> who is here for the Third Party CI Working Group meeting?
15:01:21 <omrim> omrim
15:01:26 <lennyb> Hello, My name is Lenny, I work for Mellanox and will be working on our CI.
15:01:49 <krtaylor> hi lennyb
15:01:53 <krtaylor> hi omrim
15:02:06 <omrim> krtaylor: Hello :)
15:02:26 <ja> moin moin
15:02:35 <krtaylor> ok, well let's get started, we have a full  agenda today
15:02:39 <krtaylor> hi ja
15:02:51 <krtaylor> #topic Third-party CI documentation
15:03:03 <ja> howdy krtaylor
15:03:19 <krtaylor> so, the documentation sprint went well, but we still have a lot of work to do
15:03:45 <krtaylor> omrim, you are going to help us get a FAQ going right?
15:04:19 <omrim> krtaylor: Yes, I will be glad to get some great refernces
15:04:46 <krtaylor> you can use the etherpad to gather ideas if you wish, or start a new one
15:04:49 <krtaylor> #link https://etherpad.openstack.org/p/third-party-ci-documentation
15:05:05 <krtaylor> there is the link, for everyone else
15:05:26 <rfolco> will revisit there and see what is missing
15:05:38 <krtaylor> ja, I think you have some interesting points in the Meta-comments
15:05:46 <krtaylor> those could be turned into patches
15:06:08 <krtaylor> rfolco, yes please, that woul dbe great
15:06:13 <krtaylor> and would be great too
15:06:54 <krtaylor> we have done a pretty good job on third-party.rst
15:07:13 <krtaylor> but running-your-own.rst is untouched
15:07:20 <krtaylor> it needs the most work
15:07:28 <ja> are there any upcoming milestones that make it harder/easier to do patches at a given time?  I'm unsure how/if the overall release dates affect documentation patches.
15:07:34 <krtaylor> we can divide that up to make progress
15:08:12 <krtaylor> ja, good question, but we are still in the middle of kilo
15:08:28 <krtaylor> we should be good for several more weeks
15:09:03 <krtaylor> and since docs don't hurt anything else, it should be fairly independent of release schedules
15:09:36 <krtaylor> so, any volunteers for running-your-own?
15:09:41 <ja> krtaylor, that's the point of the question... interaction between doc and release dates.  e.g. in w3c there is a "quiet period" at certain points where nothing new gets published.
15:10:05 <ja> ... little/no interaction was the hoped-for state
15:10:10 <krtaylor> ja, oh, yeah I remember that, no nothing like that here
15:10:38 <krtaylor> especially for what we are doing to help external testing
15:11:03 <ja> krtaylor, is the same true of the puppet splits or will those "have" to quiesce "during" the release?
15:11:43 <krtaylor> lennyb, getting your perspective on the documentation would be really good, and we can help you get started in patch writing
15:12:05 <krtaylor> ja, not sure I understand, do you mean for adopting the module split-out?
15:12:07 <ja> ... if the CI work is closer to ... well, CI process... that's ideal for me at least.
15:12:32 <lennyb> ok,
15:12:49 <ja> krtaylor, since puppet is "code" not "doc" it might have to play by different rules - that was the point of that question
15:13:15 <krtaylor> ja, good stuff, but lets focus on docs for the moment, can we discuss that at the end in open discussion?
15:13:26 <ja> sure
15:14:13 <krtaylor> ok, so how do we get the running-your-own to move forward?
15:14:30 <krtaylor> do we need to assign chunks? any volunteers?
15:14:39 <krtaylor> I'll be in, obviously
15:15:16 <rfolco> seems to be huge
15:15:27 <krtaylor> yes, and needs the most work
15:15:52 <krtaylor> So, I'll take the requirements section
15:16:02 <krtaylor> everyone, grab a section in the etherpad
15:16:21 <ja> do we have consensus on what purposes it is intended to serve?  e.g. purely reference, closer to tutorial
15:16:59 <ja> ...I wouldn't really know what to put into a patch until I understand where the wg wants it to land.
15:17:50 <krtaylor> ja, I think refreshing what is there is the first goal, but including links to other parts of the existing documentation is a really good thing
15:18:06 <krtaylor> so, reuse the infra manual as much as possible
15:18:39 <rfolco> I guess for running-your-own it's just an update. You need a balance between detailed/superficial
15:18:51 <krtaylor> we, as a work group, own keeping this document up to date, so minimizing revisions by referencing other infra docs is a priority
15:18:53 <krtaylor> rfolco, yes
15:20:06 <krtaylor> ok, well, I encourage everyone to read that doc and think about a section to re-write, its a great way to get started in the community
15:20:22 <krtaylor> #link http://ci.openstack.org/running-your-own.html
15:20:28 <krtaylor> just for completeness
15:21:09 <krtaylor> also remember to please set topic to 'third-party-ci-documentation'
15:21:27 <krtaylor> on any patches, makes it easier to track as a group
15:21:51 <krtaylor> ok, any questions on documentation, else we'll move on
15:21:52 <ja> If I see its current form as a reference, and I think a tutorial is what newbies really need, is the net that I should just write it all up as a patch first and then see if reviewers like it?
15:22:19 <krtaylor> ja, smaller is better, just start with a section
15:22:28 <krtaylor> easier to review and merge
15:22:37 <omrim> krtaylor: Great doc thanks
15:23:15 <krtaylor> omrim, thank you for your FAQ patch!
15:23:32 <krtaylor> ok, lets move on, we have a full agenda today
15:23:48 <krtaylor> #topic Splitting out puppet modules
15:24:01 <krtaylor> I left this on the agenda, but just to summarize
15:24:30 <krtaylor> not sure if we have asselin yet, its early for him
15:24:34 <krtaylor> it was a tremendous success
15:24:39 <krtaylor> and a lot of fun
15:25:15 <mmedvede> +1
15:25:19 <krtaylor> thanks mmedvede for a big part, how many patches?
15:25:33 <mmedvede> I think I did 14 during the sprint
15:25:40 <mmedvede> modules
15:25:46 <mmedvede> ~3 patches per module
15:25:57 <krtaylor> wow, nice
15:26:18 <krtaylor> yeah, I did a search to see and stopped counting after a page
15:26:49 <krtaylor> any thoughts about what went well in the sprint, what didnt?
15:27:21 <mmedvede> The result is that now puppet modules are in their own projects. And we are encouraged to make those modules more consumable by 3rd parties
15:27:49 <mmedvede> krtaylor: the ordering was an issue
15:28:10 <krtaylor> I think pleia2 is going to summarize the virtual sprints in general via email, more teams should consider them
15:28:16 <krtaylor> mmedvede, agreed
15:28:21 <mmedvede> a lot of merge conflicts. We should have figured out a way to avoid this. Other than that, it was efficient
15:28:33 <krtaylor> rebase, rebase, rebase
15:29:00 <krtaylor> it was amazing, good infra core participation was critical
15:29:07 <mmedvede> Infra team helped a lot
15:29:23 <krtaylor> ok, lets move on
15:29:43 <krtaylor> #topic Spec for in-tree 3rd party ci solution
15:30:14 <krtaylor> please review this spec, it is a really good direction and a place where we can get involved
15:30:32 <krtaylor> #link https://review.openstack.org/#/c/139745/
15:31:00 <krtaylor> I know asselin would appreciate any feedback or ideas
15:32:07 <krtaylor> and finally
15:32:15 <krtaylor> #topic Highlighting Third-Party CI Service
15:32:33 <krtaylor> this was an idea we had last year, that I wanted to start doing again
15:33:02 <krtaylor> one of the goals of this work group is to help each other out, and share how we solved the hard problems
15:33:12 <krtaylor> so, to kick this off again
15:33:26 <krtaylor> I figured that I would volunteer my team
15:33:35 <krtaylor> and specifically rfolco
15:33:43 <krtaylor> to come and share what we are doing
15:33:58 <krtaylor> and how we solved some problems in our environment
15:34:33 <krtaylor> so, rfolco it is all yours
15:35:02 <rfolco> sure I'll just summarize few topics due to the limitation of this format and time
15:35:12 <rfolco> and make breaks for questions
15:35:35 <rfolco> We'll base the discussions on the article https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-44a4f27eba32/entry/building_your_openstack_3rd_party_ci_system1?lang=en.
15:35:48 <rfolco> For this meeting we're going to focus on the problems we solved and also on the improvements we've made so far.
15:36:06 <krtaylor> #link https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-44a4f27eba32/entry/building_your_openstack_3rd_party_ci_system1?lang=en
15:36:50 <rfolco> so how did we avoid our CI to break every time Openstack Infra changed their code?
15:37:01 <rfolco> Successful rate of builds in Jenkins increased from ~60% to ~97% after implementing two separate envioments: production and development. Before that the CI system was so sensitive and broke more often.
15:37:02 <rfolco> A typical CI system needs custom code and configuration overrides on top of OpenStack Infra code (system-config and project-config). Pinning these projects to a code level on production environment increased stability of our CI jobs significantly.
15:37:22 <rfolco> --pause for questions--
15:37:55 <ja> rfolco, how much of that breakage do you think would be addressed by the puppet work?
15:38:40 <rfolco> I don't have a good estimate number, but I think about 30% more or less
15:38:45 <krtaylor> so, I guess we could also say that we are using the upstream infra ci ported to our environment, our goal was to follow them as closely as possible, but if we followed too close, we were not stable
15:38:57 <ja> ...and thinking out loud, does it suggest that *infra* needs a CI process to avoid breaking 3rd parties
15:39:24 <ja> 30% is pretty substantial
15:39:25 <krtaylor> ja, no, not their concern, it is up to us to address
15:39:32 <rfolco> the problem ja, is that we had in the beginning some code overrides, not only configuration
15:40:02 <rfolco> our fault on designing a stable CI that does not have workarounds or hacks in the code
15:40:12 <rfolco> he right way to do it is to override config
15:40:30 <rfolco> so now we pin the code to a stable code level that we know that works
15:40:44 <ja> +1 on overriding config as the right way
15:40:47 <rfolco> and do experiements / test on a separate environment (dev)
15:40:55 <mmedvede> ja, puppet split work should reduce the amount of custom puppet code we have, this is a big advantage
15:41:04 <krtaylor> ++
15:41:48 <rfolco> ok so moving on
15:41:51 <wznoinsk> hi rfolco
15:41:57 <rfolco> Production services Jenkins, Nodepool and Zuul run on VMs in an x86 cloud. Each of these services has a "clone" for the development environment. This enables a better control of code levels in each service. Another advantage is the ability of creating snapshots for the services.
15:42:27 <rfolco> feel free to ask any questions wznoinsk, this is more fun being interactive :)
15:42:43 <wznoinsk> in what situations config from upstream has to be updated in 3rd party ci? I'm not using Zuul/Puppet at this very moment would there be something I'd need from upstream infra projects in this case?
15:43:28 <rfolco> Well all depends on your needs
15:43:55 <rfolco> if you wanna report back to community, yes, you need to override zuul yaml config with your config
15:44:09 * krtaylor notes a good documentation topic
15:44:36 <rfolco> this is just an example, the article details better the most common overrides you have to make, and what are their purposes
15:45:01 <rfolco> in our case, we started testing NOva project
15:45:27 <wznoinsk> so if I'm only using Jenkins + lxc containers in my case via docker (a basic 3rd party CI setup) and I'm always fetching master of each of openstack projects + the change proposed I can leave without getting infra configs for now?
15:45:30 <rfolco> so we went to zuul yaml, and override upstream yaml with our custom yaml just to test Nova
15:46:05 <wznoinsk> s/leave/live
15:46:58 <rfolco> the key point is: do you need to trigger your test for every patch? Another important thing: if you are stable running upstream code, thats awsome
15:47:27 <rfolco> but in our case we decided to not work with latest upstream code from Infra
15:47:53 <rfolco> It's ok just to use Jenkins
15:48:10 <rfolco> in case you don't need to report back and you define when your tests will run
15:48:30 <rfolco> You can automated your builds this way, nothing wrong...
15:48:43 <wznoinsk> with efficency of lxc container that's the plan to run on every patchset, also 'stable running upstream code' is a negation of itself ;-)
15:48:59 <rfolco> :)
15:49:18 <krtaylor> wznoinsk, it also depends on what and how many tests you have to run, and how big your system needs to scale
15:49:52 <wznoinsk> I'm still able to comment back on the build, potentially listing tempest/other tests and their success separately in a single comment (without using Zuul), I'm trying to understand whether I should go the Zuul-way already ...
15:49:53 <rfolco> zuul listens on gerrit the patches and queues jobs for you
15:50:12 <rfolco> and reports back
15:50:37 <rfolco> so you define in layout.yaml what to listen, what jobs to trigger and how to report back
15:51:06 <rfolco> wznoinsk, I'll be happy to answer in more detail about this topic...
15:51:32 <wznoinsk> I know it's beneficial to have Zuul when you have multiple proposed code changes that may depend on each other (two independent jenkins jobs wouldn't catch the reliance) but I don't think using Zuul is a must for 3rd party CI to have, is it?
15:52:41 <wznoinsk> sure rfolco, I'll catch you after this meeting
15:52:46 <krtaylor> wznoinsk, I would really like to hear about your environment, can I schedule you for an upcoming week to tell us about it?
15:53:14 <ja> My impression is that the choice of implementation components on your side of the firewall is yours.  In that sense, the ssh mechanism Zuul uses and so on is "interface" not "implementation"
15:53:15 <wznoinsk> krtaylor: yes, with pleasure, thanks
15:53:16 <rfolco> as I said you can run your CI without it, but its easier with zuul I think
15:53:53 <ja> ...of course, as with any community, using what others use makes it easier to find help and give it
15:54:05 <wznoinsk> ja I'd agree, but without Zuul 3rd party CI tests could be missing some of the code breakages that may happen when two/more code changes can break each other
15:54:21 <krtaylor> rfolco, we had several areas that we made modifications right?
15:54:31 <rfolco> yes,
15:54:43 <rfolco> The customizations required to run a 3rd-party CI include changes in: layout.yaml (Zuul); nodepool.yaml; (Nodepool); projects.yaml & devstack-gate.yaml (Jnkins Job Builder).
15:54:43 <rfolco> To override code and configurations two internal Git repositories have been created: (1) puppet-config, which overrides code and configuration; and (2) ibm-devstack-gate, which contains additional customizations for devstack-gate jobs such as regex file (skip list for Tempest), pre_test_hook.sh and Swift upload log script.
15:55:12 <rfolco> Both internal repositories (puppet-config and ibm-devstack-gate) work in different branches: dev (development) and master (production). To turn reporting off on production one needs to checkout puppet-config master branch and comment out the lines "success" and "failure" in layout.yaml. Changing base image or memory requirements for slaves in Development Nodepool, one would change nodepool.ya
15:55:12 <rfolco> ml in the repository but push changes to dev branch instead. The same idea applies to JJB configuration files. It's also possible to checkout ibm-devstack-gate and modify regex file and include a new test to the skip list for Tempest runs.
15:56:17 <rfolco> some of the customizations and where we spent more time on out CI:
15:56:20 <rfolco> (1) Build the cloud infrastructure for the services and compute nodes (2) Install and configure services using Puppet (3) Build CirrOS for Power with mainline kernel (4) Skip Tempest failures (5) Resolve devstack-gate problems on Power platform by overriding config (6) Build custom MySQL to resolve issues on Power (7) Map and debug Tempest failures (8) Investigate concurrency problems for Tem
15:56:20 <rfolco> pest (9) Cleanup database for expired keystone entries and deleted instances (our current problem this week) (10) Upload script to Softlayer Swift.
15:57:32 <rfolco> I'm open for anybody to ping me after the meeting nd detail better any of these
15:57:50 <rfolco> back to you Kurt since time is over :)
15:57:53 <rhe00_> rfolco: thanks for writing up the article. Very informative.
15:58:07 <krtaylor> the blog illustrates this pretty well
15:58:27 <wznoinsk> rfolco: thanks for getting it all together, it's a compact version of what you should know about 3rd party ci, learned a few new things from it as well
15:58:40 <rfolco> thats great to hear
15:58:54 <krtaylor> well, we are still learning how to do this, but rfolco, this was a really good kickoff
15:59:01 <rfolco> thanks guys.. wznoinsk if you wanna discuss zuul role better let me know
15:59:24 <krtaylor> I hope to get every CI team to come and share how they fixed problems for their environment
15:59:43 <krtaylor> we are out of time, thanks everyone
15:59:55 <rfolco> thx
15:59:55 <krtaylor> really good meeting, see you next time
16:00:34 <krtaylor> #endmeeting