#openstack-meeting log

17:00:38 <krtaylor> #startmeeting third-party
17:00:39 <openstack> Meeting started Tue Jul 21 17:00:38 2015 UTC and is due to finish in 60 minutes.  The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:43 <openstack> The meeting name has been set to 'third_party'
17:00:52 <asselin> o/
17:00:57 <krtaylor> who's here for the third party CI working group?
17:00:59 <mmedvede> o/
17:01:10 <krtaylor> hi asselin , mmedvede
17:01:54 <krtaylor> asselin, thanks again for running the last meeting
17:02:12 <patrickeast> hi
17:02:13 * krtaylor feels relaxed after vacation time off
17:02:21 <krtaylor> hi patrickeast
17:02:28 <asselin> you're welcome
17:03:16 <krtaylor> I realized that I put the wrong date on the agenda, glad you all are here anyway
17:03:24 <krtaylor> here's the agenda
17:03:27 <krtaylor> #link https://wiki.openstack.org/wiki/Meetings/ThirdParty#7.2F21.2F15_1700_UTC
17:03:57 <krtaylor> #topic Announcements
17:04:06 <krtaylor> I don't have any, none listed
17:04:17 <krtaylor> anyone have anything to quickly announce?
17:04:38 <krtaylor> deadlines? news?
17:05:13 <krtaylor> #topic Common CI Vsprint
17:05:29 <krtaylor> looks like it went well, and now 4 are done!
17:05:48 <asselin> hi, yes, didn't finish, but made a lot of progess
17:06:13 <asselin> nodepool is the most challenging one because it involves changes to nodepool itself
17:06:27 <krtaylor> asselin, what are you thinking for the next steps?
17:06:36 <krtaylor> want to have a second vsprint?
17:07:08 <asselin> krtaylor, not thinking about that. I think we can just do normal reviews and get it done that wat
17:07:48 <krtaylor> asselin, fair enough
17:07:51 <asselin> part of the issue is that we are too dispersed geographically for nodepool, so it's difficult to iterate
17:08:03 <mmedvede> I had a general question on the sprint. I noticed some refactoring/move patches where not just moving things, but combining more changes in a single patch
17:09:03 <mmedvede> I think it did slow things done, e.g. my patch (low priority) had a comment to have things changed, in comparison on how they where in system-config
17:09:06 <mmedvede> #link https://review.openstack.org/#/c/199790/
17:09:16 <asselin> mmedvede, yes, I try to limit & enforce scope, but definietly need to do better with that
17:09:36 <krtaylor> it would be good to limit the initial drop to be just the refactoring to make it work
17:09:55 <mmedvede> asselin: ok, good to know. I wanted to do move without regression in one patch, and any improvements in a different ones
17:10:44 <mmedvede> krtaylor: +1
17:10:44 <asselin> mmedvede, there are exceptions of course, but that is what we should aim for
17:11:16 <asselin> perhaps submitting follow-up patches and comment with 'done in patch#" can help
17:11:32 <mmedvede> I feel there is sometimes a push to do more than necessary in a single patch, not sure why
17:12:43 <krtaylor> as long as we agree, then comments to split out work will be supported
17:13:12 <krtaylor> why would someone refuse a higher patch count?  :)
17:13:46 <asselin> I think we need to stand stronger on that.
17:14:44 <asselin> I will add comments to that end: separate refactor from improvement patch.
17:15:12 <krtaylor> #agreed Take a stronger review position on common ci patches that do more than minimal refactoring
17:15:17 <mmedvede> asselin: thank you
17:15:31 <asselin> mmedvede, thanks for bringing it up
17:15:47 <krtaylor> hm, not sure agreed worked, whatever
17:15:56 <krtaylor> we'll see in logs
17:16:08 <krtaylor> asselin re: iterate on nodepool, not sure I understood that
17:16:42 <asselin> just mean working throught the patch review cycle https://etherpad.openstack.org/p/common-ci-sprint
17:16:57 <asselin> there are quite a few interelated patches
17:17:18 <asselin> those are difinitely more than a refactor
17:17:46 <asselin> but necessary to not have the nodepool.yaml file being a template
17:18:08 <asselin> so it's more of an improvement followed by a refactor
17:18:15 <krtaylor> so just quicker reviews to land everything in one group
17:18:45 <krtaylor> I understand what you are saying
17:19:30 <krtaylor> actually, that sounds like a good exercise for a vsprint, with lots of ci and infra involvement
17:19:56 <krtaylor> or at least a focus hours during/after an infra meeting
17:21:39 <asselin> honestly, we got quite a bit done prior the virtual sprint, so I think we should do that by keeping reviews & testing active
17:21:54 <krtaylor> asselin, your call, let us know how we can help
17:22:23 <mmedvede> asselin: any idea why this one did not merge? https://review.openstack.org/#/c/199737/
17:22:36 * asselin looks
17:23:04 <mmedvede> might need a re-nudge, maybe gerrit had problems at the time
17:23:21 <asselin> yes, seems like it
17:25:11 <asselin> oh I see....it's depends-on is still in review
17:25:55 <mmedvede> asselin: good catch
17:26:42 <krtaylor> anything else for common ci?
17:26:45 <asselin> i've nothing else
17:27:09 <mmedvede> krtaylor: proceed
17:27:10 <krtaylor> #topic Spec to have infra host monitoring dashboard
17:27:42 <krtaylor> so this was moving well, no major problems
17:28:02 <krtaylor> I learned about another dashboard
17:28:29 <mmedvede> #link https://review.openstack.org/#/c/194437/
17:28:32 <krtaylor> jogo wrote lastcomment
17:28:44 <krtaylor> thanks mmedvede
17:28:50 <krtaylor> #link http://jogo.github.io/lastcomment/
17:29:23 <wznoinsk> krtaylor: sorry to inject it here, did you look into having nagios + nagstamon (as a desktop app instead of dashboard) ?
17:30:20 <krtaylor> wznoinsk, no, although I'm not sure its a bad idea
17:30:47 <krtaylor> but it would need to be a service that infra would host
17:31:12 <krtaylor> that way, it would be available, else it would be dependent on someone privately hosting it
17:31:13 <mmedvede> krtaylor: jhesketh suggested to rename the spec files to avoid confusion, I do not think it has been addressed
17:31:22 <wznoinsk> nagstamon is just an app on your workstation you have it in your systray that poll nagios server (over http) for any alerts nagios server is seeing
17:32:04 <krtaylor> mmedvede, I changed the topic, I didn't agree  :)  Also, we are moving the original spec...eventually
17:32:06 <wznoinsk> krtaylor: anyways, we can take it offline, I've got some experience with that and really like that (over dashboards or emails)
17:32:13 <krtaylor> sweston, are you around?
17:33:10 <sweston> krtaylor: yes, sir
17:33:32 <krtaylor> wznoinsk, it would mean that someone would have to install that to see the history of a system that just posted a failed comment
17:33:34 <sweston> reading backlog
17:34:01 <krtaylor> wznoinsk, I'd think that a page would be easier for a dev to hit to see if a system was off in the weeds
17:34:24 <wznoinsk> krtaylor: nagstamon is just a desktop version of what you normally see on nagios dashboard(s)
17:34:50 <wznoinsk> you can use nagios webpages for 'non-infra' (devs)
17:34:58 <krtaylor> sweston, thanks for joining us! I had pinged you yesterday, was wondering if you had a chance to see if a patch could change projects in gerrit
17:35:20 * asselin will be back in a few
17:35:33 <jogo> krtaylor: I tried to keep lastcomment as simple as possible, it is a tiny python requests script that runs from a cron job right now
17:35:42 <sweston> krtaylor: no, unfortunately I have not had any spare time at all.  I might be able to get to it later in the week
17:35:58 <krtaylor> wznoinsk, I am certainly open to suggestions, I know that others are using nagios (we are too)
17:36:31 <sweston> krtaylor: actually, I am getting ready to upgrade my systems again, so today or tomorrow would be a good time to test this
17:36:37 <krtaylor> sweston, thanks, let me know if you can't, but it woul dbe good to keep all the history and comments with the original spec
17:36:50 <wznoinsk> krtaylor: let's talk some other time as it's go-home time for me already, I'll catch you on #openstack-infra if you don't mind
17:36:52 <mmedvede> wznoinsk: nagios is generally good at monitoring multiple hosts, I am confused how it can be used to show status of third-party CIs
17:37:06 <krtaylor> sweston, else, we can capture the test and include it with a txt file when it is moved to third-party-ci-tools
17:37:38 <krtaylor> jogo, thanks, I do really like the layout, clean and simple
17:37:47 <sweston> krtaylor: yes, we can consider that as a last pass option
17:37:54 <wznoinsk> mmedvede: nagios is powerfull you can use any script program (bash, python, perl, java or whatever you pick) to check 'a thing' for you and feed status back to nagios
17:38:41 <krtaylor> wznoinsk, thanks for coming, and let me know what you are thinking, if it provides more with its framework, it may be useful for other tasks in infra as well
17:38:57 <wznoinsk> mmedvede: krtaylor: I'm a huge fan of nagios so I'll sound like everything is doable in nagios, because it is in my opinion :-)
17:39:10 <mmedvede> wznoinsk: I see, so you basically suggest not to write a web frontend, but reuse nagios
17:39:16 <wznoinsk> krtaylor: sure, I'll catch you in the week
17:39:22 <krtaylor> wznoinsk, it is a good tool, and widely used
17:39:30 <krtaylor> wznoinsk, thanks
17:39:44 <wznoinsk> mmedvede: web frontend is probably not the strongest part of nagios especially for wider (than just infra) public so it may still be needed
17:39:59 <wznoinsk> but the infra part should be well covered by nagios out of the box
17:40:13 * krtaylor is trying to keep up with the different threads
17:40:24 <mmedvede> wznoinsk: ok, now I am more confused. Would definitely want to learn more about what you suggest :)
17:40:48 <mmedvede> wznoinsk: I think we might be talking about monitoring different things
17:40:57 <krtaylor> sweston, I'll ping you tomorrow and see where you are at, I think it would help speed the infra hosting spec if it were moved
17:41:19 <jogo> krtaylor: and the code is fairly compact too
17:41:21 * asselin returns
17:41:26 <sweston> krtaylor agreed
17:42:21 <krtaylor> jogo, it would be ideal if we could combine efforts with patrickeast  and have a super simple dashboard
17:42:59 <krtaylor> jogo, patrickeast - have either of you compared with the other dashboard?
17:43:21 <wznoinsk> mmedvede: I monitor my 3rdparty CI with a script under nagios, not sure what exactly you want to monitor, could you ping me the spec pls?
17:43:39 <patrickeast> yea we’ve chatted a bit about them
17:44:13 <mmedvede> wznoinsk: this is the spec in discussion: https://review.openstack.org/#/c/194437/
17:44:31 * krtaylor has not had the time to do a functional comparison of dashboards
17:44:32 <patrickeast> i think one issue is that they are kind of targeting different audiences, so they prioritize different features/designs
17:44:56 <patrickeast> mine is more focused on someone who wants to troubleshoot a ci system and know what/where things broke
17:45:05 <jogo> patrickeast: yeah, it would be possible to make two views. I think the big difference is how we collect data
17:45:21 <mmedvede> wznoinsk: this is patrickeast 's dashboard #link http://ec2-54-67-102-119.us-west-1.compute.amazonaws.com:5000/?project=openstack%2Fnova&user=&timeframe=24&start=&end=
17:45:29 <jogo> I just use the gerrit REST API periodically, patrickeast wanted real time data so gerrit stream ... which makes things a lot more complex
17:45:47 <jogo> I am happy to see any solution that is agreed upon
17:45:58 <jogo> I would be more then happy to stop running lastcomment
17:46:12 <patrickeast> haha, same boat here, i’m ok either way too
17:46:20 <patrickeast> i just want *something*
17:46:37 <wznoinsk> mmedvede: yes, I like patrickeast's work, even if that dashboard saw day light for the first time (and it was more basic)
17:46:45 <krtaylor> jogo, patrickeast thanks for your flexibility
17:47:42 <krtaylor> jogo, as patrickeast said, yours is useful too, I don't see why you'd have to stop running it, unless you wanted to
17:47:53 <wznoinsk> I do understand there will be different criteria you'd be scoring the CIs on (i.e.: how many times per 10 a CI failed, how many times per 10 a 3rdparty CI failed when upstream jenkins DID NOT etc.)
17:47:55 <patrickeast> krtaylor: +1
17:48:06 <krtaylor> but having everyone jump in on maintaining one dashboard and improving it would be MUCH better
17:48:20 <patrickeast> jogo: i think yours is actually more useful for someone who just wants to know what systems are ok for a project at a glance
17:48:32 <patrickeast> its just harder to figure out *why* they are broken
17:48:42 <patrickeast> but that doesn’t matter to 99% of the openstack devs
17:48:44 <sweston> krtaylor: +1 .  I would prefer this as well, and would rather have folks contributing to radar
17:48:59 <wznoinsk> mmedvede: with more criteria and logic you want a more advanced dashboard, or easy to read dashboards 'per problem'
17:49:21 <asselin> sweston has a good point
17:49:40 <krtaylor> I think the most important thing is that we all agree, very quickly, on one dashboard to get the hosting spec done
17:49:43 <jogo> patrickeast: right, that is exactly what I head in mind when I put it together
17:49:51 <asselin> these others are supposed to be 'temporary'
17:50:03 <krtaylor> then we can start working on the full featured solution
17:50:03 <mmedvede> wznoinsk: I want something I can use to detect anomalies in our CI compared to others. patrickeast 's scoreboard is currently sufficient for our usecase
17:50:28 <jogo> patrickeast: it wouldn't be too hard to add an option to my dashboard to list failed cases when you click something
17:50:39 <mmedvede> wznoinsk: I do not necessarily want 'automagic' detection, I just want data presented in a consumable way
17:51:18 <krtaylor> ok so do we have agreement to stay with scoreboard for now? else we prob have to wait 2 weeks to see any movement
17:52:28 <krtaylor> jogo, would you be willing to move yours to third-party-ci-tools so others could contribute through the gerrit process?
17:52:43 <jogo> krtaylor: no problem
17:53:09 <mmedvede> If we would ask infra to deploy scoreboard or lastcomment, where would puppet modules go
17:53:11 <mmedvede> ?
17:53:36 <krtaylor> mmedvede, its in the spec (roughly)
17:53:48 <mmedvede> krtaylor: yes, sorry
17:53:55 <jogo> krtaylor: I don't think it makes sense to have infra host a temporary solution
17:54:19 <krtaylor> jogo, actually, that was their suggestion
17:54:25 <jogo> in the time we are sorting out a 'temporary' thing a final thing could be done
17:54:47 <krtaylor> they wanted something now, and it would get all the structure in place
17:55:18 <krtaylor> thats really the catch here, we need a basic solution NOW
17:55:27 <mmedvede> krtaylor: +1 for NOW
17:56:32 <krtaylor> ok, well, we are running out of time for the meeting, move to email thread? or a quick decision?
17:56:37 <wznoinsk> mmedvede: I'm affraid that to check dashboard for all projects, all CIs in each project, scanning it once/a few times a day is taking a lot of human cycles, I'd prefer to have a check and threshold defined that lets me know
17:57:10 <krtaylor> wznoinsk, it would really only be visited when a dev got a neg comment
17:57:21 <krtaylor> and it would be filtered by project
17:57:36 <krtaylor> just meant to see if a system is off in the weeds or not
17:58:07 <krtaylor> and for that reason, I'd choose patrickeast 's if I had to pick one today
17:58:34 <wznoinsk> krtaylor: yes, it's good for ad hoc checks
17:58:37 <krtaylor> its easy to compare a systems results to everyone else that tested the same patch
17:58:52 <krtaylor> so do we agree?
17:59:53 <mmedvede> +1
17:59:53 <krtaylor> out of time
18:00:08 <krtaylor> I'll move to email thread
18:00:19 <krtaylor> thanks everyone, really good meeting
18:00:33 <krtaylor> #endmeeting