#openstack-meeting-4 log

15:01:00 <krtaylor> #startmeeting third-party
15:01:01 <openstack> Meeting started Wed Feb 18 15:01:00 2015 UTC and is due to finish in 60 minutes.  The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:04 <openstack> The meeting name has been set to 'third_party'
15:01:11 <krtaylor> Hi everyone
15:01:17 <patrickeast> hey
15:01:17 <ja_> moin moin
15:01:18 <mmedvede> hi
15:01:29 <omrim> Hello
15:01:34 <krtaylor> time for another Third Party CI WG meeting
15:02:01 <lennyb> hi
15:02:06 <lennyb> hi
15:02:38 <krtaylor> looks like we have a good group today
15:02:39 <rfolco> o/
15:03:01 <krtaylor> here is the agenda:
15:03:05 <krtaylor> #link https://wiki.openstack.org/wiki/Meetings/ThirdParty#2.2F18.2F15_1500_UTC
15:03:51 <krtaylor> #topic Announcements
15:04:19 <asselin_> hi
15:04:32 <krtaylor> I'll start off by reminding everyone of gerrit being upgraded March 21st
15:04:41 <krtaylor> hi asselin
15:05:02 <ja_> what is the "impact" of the upgrade to users?
15:05:32 <asselin_> ja_, for those with firewalls blocking the port, it means firewall updates
15:05:39 <krtaylor> ja_, really only if your CI system needs some kind of firewall egress configuration
15:05:48 <ja_> ok thx
15:05:51 <krtaylor> asselin beat me to it
15:06:20 <krtaylor> ok, any other quick announcements before we move on?
15:07:09 <krtaylor> #topic Third-party CI documentation
15:07:59 <krtaylor> ok, so we still have some work to do here, especially in running-your-own
15:08:41 <krtaylor> and I am thinking that it is slowed due to the need to walk through it
15:09:26 <krtaylor> we have some patches, which reminds me
15:09:57 <krtaylor> rfolco, can you change your topic to 'third-party-ci-documentation' on your patch
15:10:20 <krtaylor> that is for everyone  - so we can track all with one query
15:10:32 <krtaylor> #link https://review.openstack.org/#/q/topic:third-party-ci-documentation,n,z
15:10:37 <rfolco> kragniz, summary line ?
15:10:46 <rfolco> krtaylor, ^
15:10:59 <rfolco> (sorry)
15:11:25 <krtaylor> rfolco, you should have a little writing pad next to Topic in the upper left of your patch review
15:11:35 <krtaylor> you can edit the topic in gerrit
15:12:11 <rfolco> topic is master, is that one ?
15:12:53 <krtaylor> rfolco, just below Branch
15:13:12 <rfolco> done https://review.openstack.org/#/c/155864/
15:13:16 <krtaylor> thanks
15:13:45 <krtaylor> lennyb, since you are getting started, any comments on the running-your-own doc would be helpful as well
15:14:22 <krtaylor> ja_, your continued input is appreciated too
15:14:51 <krtaylor> ok, onward
15:15:03 <krtaylor> #topic Spec for in-tree 3rd party ci
15:15:14 <lennyb> krtaylor, thanks I will try to document
15:15:26 <krtaylor> asselin, you have a new rev on the spec
15:15:28 <asselin_> so I updated the spec a bit yesterday
15:15:39 <krtaylor> #link https://review.openstack.org/#/c/139745/
15:15:56 <asselin_> yes, it is now a 'priority effort' for openstack-infra
15:16:02 <krtaylor> I haven't had a chance to review it yet, will today
15:16:17 <krtaylor> yea!
15:16:34 <asselin_> yes, very excited about that! :)
15:17:18 <krtaylor> asselin, it will enable a lot of goodness
15:17:44 <rfolco> asselin, is the refactor a override on infra puppet, a fork or what ? could you please clarify ?
15:18:23 <asselin_> rfolco, the refactor is to allow the puppet scripts to be more easily reused
15:19:04 <asselin_> rfolco, there are lots of sections in system-config that are needed, but not easily resuable
15:19:51 <rfolco> I read the spec and I had the impression it was a fork from infra scripts
15:20:15 <asselin_> that today's solution
15:20:41 <asselin_> rfolco, could you comment on the specific sections? I will try to clarify
15:21:12 <rfolco> asselin, wil ldo thx
15:21:37 <krtaylor> yes, and just like the puppet module split-out, a great opportunity for third-party WG to get involved and help out
15:21:38 <asselin_> #link https://review.openstack.org/#/c/137471/
15:21:49 <asselin_> ^^ this is a related spec that will help a lot as well
15:23:20 <krtaylor> any other questions on in-tree ?
15:23:34 <krtaylor> an action for everyone, please go read the spec
15:24:04 <krtaylor> thanks for the overview asselin
15:24:29 <krtaylor> oops asselin_
15:24:41 <asselin_> np :)
15:24:49 <krtaylor> ok, next
15:24:52 <krtaylor> #topic Repo for third party tools
15:25:09 <krtaylor> like last week, I am socializing the idea of creating a repo
15:25:41 <krtaylor> a place for ci teams to share their scripts and other goodies that make their job easier
15:26:19 <asselin_> +1 I like the idea
15:26:21 <krtaylor> if the consensus is that it is a good idea, I'll see about getting that setup
15:26:51 <patrickeast> i like this idea, you thinking a openstack repo or just a public github kind of thing?
15:26:57 <krtaylor> there are several tools available, but unless you know about someone's github account, you don't know they exist
15:27:18 <krtaylor> actually, a stackforge repo
15:27:29 <patrickeast> gotcha
15:27:46 <krtaylor> we'd have to discuss the organization of it, etc
15:28:19 * krtaylor wonders if we'd need a spec to propose the idea formally
15:28:54 <patrickeast> that might be a good idea, or at least a wiki page or something to capture what we decide for organization
15:29:08 <krtaylor> well, I haven't found anyone that hated the idea...yet
15:29:40 <asselin_> A wiki or etherpad might be good to start.
15:29:52 <krtaylor> I know we have tools internally that we could share, scripts, dashboards, etc
15:29:59 * krtaylor agrees
15:30:38 <krtaylor> #action krtaylor to set up wiki for third-party CI WG repo, and/or etherpad
15:31:46 <krtaylor> ok, goodness, thanks everyone for the input
15:31:54 <krtaylor> lets move on
15:31:59 <krtaylor> #topic Restart monitoring dashboard effort
15:32:05 <krtaylor> sweston, ping?
15:32:48 <krtaylor> there is an effort to have a public monitoring dashboard, basically a new/nicer/more featured radar
15:33:20 <krtaylor> some good comments here:
15:33:26 <krtaylor> #link https://review.openstack.org/#/c/135170/
15:34:01 <krtaylor> it would replace the need to change status on ThirdPartySystems, at least eventually
15:35:01 <krtaylor> sweston has been swamped with work from his dayjob, but everything is available, just needs input, reviews, ideas to converge
15:36:00 <krtaylor> ok, we can come back to that if time permits, but I want to get to the next bit
15:36:47 <krtaylor> #topic Highlighting Third-Party CI Service
15:37:15 <krtaylor> continuing on the success of rfolco 's discussion of PowerKVM CI
15:37:37 <krtaylor> this week we have Pure Storage CI
15:38:01 <krtaylor> patrickeast, can you share a brief intro on your system
15:38:06 <patrickeast> yep
15:38:16 <krtaylor> maybe some problems you had and how you solved them
15:38:35 <patrickeast> so, first off i made some stuff to share
15:38:41 <patrickeast> http://ec2-54-67-119-204.us-west-1.compute.amazonaws.com/ci_stuff.svg
15:38:49 <patrickeast> https://github.com/patrick-east/os-ext-testing-data
15:38:58 <patrickeast> a poorly drawn diagram of our setup
15:39:04 <patrickeast> and what we use for our data repo
15:39:23 <krtaylor> #link http://ec2-54-67-119-204.us-west-1.compute.amazonaws.com/ci_stuff.svg
15:39:31 <krtaylor> #link https://github.com/patrick-east/os-ext-testing-data
15:39:49 <patrickeast> as of a month (almost 2) ago i have switched our system over to using asselin’s https://github.com/rasselin/os-ext-testing scripts
15:40:12 <krtaylor> nice
15:40:16 <patrickeast> prior to that we had started with the instructions on jay pipes blog post and cobbled together a system without nodepool
15:40:28 <patrickeast> we ran into all kinds of issues with re-using static slaves though
15:40:38 <asselin_> nice pic
15:40:56 <krtaylor> yep, without nodepool due to setup requirements?
15:41:12 <patrickeast> we went that way originally just due to lack of knowing any better
15:41:35 <krtaylor> ah, ok
15:42:00 <asselin_> patrickeast, what's "RDO"
15:42:13 <krtaylor> RedHat Repo?
15:42:17 <patrickeast> https://openstack.redhat.com/Main_Page
15:42:31 <patrickeast> its like their open source version of the redhat openstack stuff
15:42:41 <patrickeast> similar to centos vs rhel
15:43:13 <patrickeast> it made it very very easy to get setup with openstack
15:44:12 <patrickeast> so, as you may have noticed on the diagram we have the nice high speed data connections that are currently not used… thats on my list of todo’s
15:44:20 <patrickeast> we are testing our cinder driver
15:44:22 <krtaylor> patrickeast, so I take it you are only testing cinder patches
15:44:27 <patrickeast> correct
15:44:37 <patrickeast> right now we are only listening for openstack/cinder changes on master
15:44:56 <patrickeast> and run the volume api tempest tests
15:45:31 <patrickeast> we are planning to add a FC cinder driver for our array in L-1
15:45:48 <patrickeast> so i’ll be adding support for that into the system early in L
15:46:02 <krtaylor> what was a really tricky part that you had to work through?
15:46:37 * asselin_ has fc ci scripts to share in some future repo tbd
15:47:08 <patrickeast> probably the hardest part was figuring out how to properly configure everything… all told there are like 50 config files involved between the openstack provider and ci system
15:47:33 <patrickeast> this is where that documentation push is really going to shine
15:48:09 <asselin_> patrickeast, does rdo help out with the openstack provider configs? or are those the ci config to point to the provider?
15:48:48 <patrickeast> it does get everything setup and working, but we’ve had to go back through and customize things a bit
15:49:00 <patrickeast> like where nova stores intances, and glance keeps images
15:49:05 <patrickeast> due to partitioning on the system
15:49:17 <patrickeast> and we had to delete all of its automatic network setup and do our own
15:49:36 <krtaylor> patrickeast, we had a similar situation, but as we worked through everything, we found ways to use upstream as-is and have less delta
15:50:49 <patrickeast> yea my goal is to try and reduce that when we add in the FC testing
15:51:10 <patrickeast> right now its only a single initiator we test with
15:51:29 <patrickeast> i’ve got 2 more even bigger ones on the rack next to it waiting to be hooked up with the array
15:51:48 <patrickeast> for those ones i’m hoping to improve upon the current setup a bit
15:52:00 <krtaylor> patrickeast, have you automated any other parts of the system for your testing?
15:52:41 <patrickeast> nothing significant, we’ve added in some scripts to cleanup the array once it is done testing
15:53:07 <krtaylor> created any monitoring framework?
15:53:20 <patrickeast> actually yea, a little one
15:53:29 <patrickeast> https://github.com/patrick-east/os-ext-testing-data/tree/master/tools/server_monitor
15:53:31 <patrickeast> so
15:53:38 * asselin_ looki ng
15:53:44 <patrickeast> we ran into a few times where the system would start failing for X reason
15:53:48 * krtaylor looks too
15:53:48 <patrickeast> either disk out of space
15:53:56 <patrickeast> or the job was unregistered
15:54:01 <patrickeast> or the array went down
15:54:20 <patrickeast> so i made a little script that sits on the master and sends email alerts whenever something like that happens
15:54:30 * patrickeast doesn’t know how to use nagios
15:54:43 <krtaylor> hehheh
15:54:53 <krtaylor> we are looking at using nagios also
15:55:18 <krtaylor> hm, something else to share with the community...
15:55:44 <patrickeast> my companies IT and internal ci teams use nagios quite a lot, so i’m hoping to get their help one day to make this integrated with all their dashboards and stuff
15:55:58 <asselin_> what does -infra use?
15:56:00 <patrickeast> but for now its nice to get an email instead of checking in and seeing it failed the last 50 builds
15:56:34 <krtaylor> asselin_, eyeballs :)
15:56:56 <asselin_> I configured zuul to send me e-mails on job status. I check the results periodically.
15:57:03 <asselin_> also a good way to fill up your mailbox
15:57:20 <krtaylor> +1000, yeah we did that, then turned it off
15:57:31 <asselin_> but looking for something better. thanks patrickeast :)
15:57:50 <krtaylor> excellent, thanks for sharing that
15:57:53 <patrickeast> oh, also, not sure if anyone else is interested but https://github.com/patrick-east/os-ext-testing-data/blob/master/tools/clean_purity.py is the cleaning script, it parses the cinder logs for any volumes/hosts/whatever from that particular test run and wipes out any left over
15:58:12 <patrickeast> we have a relatively small max volume limit on our arrays so it becomes an issue if we aren’t agressive about it
15:58:15 <krtaylor> #link https://github.com/patrick-east/os-ext-testing-data/tree/master/tools/server_monitor
15:58:37 <krtaylor> #link  https://github.com/patrick-east/os-ext-testing-data/blob/master/tools/clean_purity.py
15:58:40 <lennyb> you can also query Jenskins with json api to see last jobs status
15:59:23 <patrickeast> yep, we thats all the server_monitor script does, although i dialed it back to just alerting when things hit the fan (health score of 0)
15:59:36 <patrickeast> since i was tired of emails for actual failures on bad patches
15:59:57 <krtaylor> well, we are close to time
16:00:06 <lennyb> yeah,  we are getting now emails only after N failed jobs
16:00:23 <krtaylor> thank you patrickeast for sharing this info about your system
16:00:26 <patrickeast> np
16:00:36 <asselin_> big thanks!
16:00:36 <patrickeast> let me know if you guys have more questions
16:00:41 <krtaylor> thanks everyone, great meeting!
16:00:44 <mmedvede> patrickeast: thank you, very good
16:00:56 <asselin_> patrickeast, a few. will ask offline
16:01:01 <krtaylor> #endmeeting