17:00:38 #startmeeting third-party 17:00:39 Meeting started Tue Jul 21 17:00:38 2015 UTC and is due to finish in 60 minutes. The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:43 The meeting name has been set to 'third_party' 17:00:52 o/ 17:00:57 who's here for the third party CI working group? 17:00:59 o/ 17:01:10 hi asselin , mmedvede 17:01:54 asselin, thanks again for running the last meeting 17:02:12 hi 17:02:13 * krtaylor feels relaxed after vacation time off 17:02:21 hi patrickeast 17:02:28 you're welcome 17:03:16 I realized that I put the wrong date on the agenda, glad you all are here anyway 17:03:24 here's the agenda 17:03:27 #link https://wiki.openstack.org/wiki/Meetings/ThirdParty#7.2F21.2F15_1700_UTC 17:03:57 #topic Announcements 17:04:06 I don't have any, none listed 17:04:17 anyone have anything to quickly announce? 17:04:38 deadlines? news? 17:05:13 #topic Common CI Vsprint 17:05:29 looks like it went well, and now 4 are done! 17:05:48 hi, yes, didn't finish, but made a lot of progess 17:06:13 nodepool is the most challenging one because it involves changes to nodepool itself 17:06:27 asselin, what are you thinking for the next steps? 17:06:36 want to have a second vsprint? 17:07:08 krtaylor, not thinking about that. I think we can just do normal reviews and get it done that wat 17:07:48 asselin, fair enough 17:07:51 part of the issue is that we are too dispersed geographically for nodepool, so it's difficult to iterate 17:08:03 I had a general question on the sprint. I noticed some refactoring/move patches where not just moving things, but combining more changes in a single patch 17:09:03 I think it did slow things done, e.g. my patch (low priority) had a comment to have things changed, in comparison on how they where in system-config 17:09:06 #link https://review.openstack.org/#/c/199790/ 17:09:16 mmedvede, yes, I try to limit & enforce scope, but definietly need to do better with that 17:09:36 it would be good to limit the initial drop to be just the refactoring to make it work 17:09:55 asselin: ok, good to know. I wanted to do move without regression in one patch, and any improvements in a different ones 17:10:44 krtaylor: +1 17:10:44 mmedvede, there are exceptions of course, but that is what we should aim for 17:11:16 perhaps submitting follow-up patches and comment with 'done in patch#" can help 17:11:32 I feel there is sometimes a push to do more than necessary in a single patch, not sure why 17:12:43 as long as we agree, then comments to split out work will be supported 17:13:12 why would someone refuse a higher patch count? :) 17:13:46 I think we need to stand stronger on that. 17:14:44 I will add comments to that end: separate refactor from improvement patch. 17:15:12 #agreed Take a stronger review position on common ci patches that do more than minimal refactoring 17:15:17 asselin: thank you 17:15:31 mmedvede, thanks for bringing it up 17:15:47 hm, not sure agreed worked, whatever 17:15:56 we'll see in logs 17:16:08 asselin re: iterate on nodepool, not sure I understood that 17:16:42 just mean working throught the patch review cycle https://etherpad.openstack.org/p/common-ci-sprint 17:16:57 there are quite a few interelated patches 17:17:18 those are difinitely more than a refactor 17:17:46 but necessary to not have the nodepool.yaml file being a template 17:18:08 so it's more of an improvement followed by a refactor 17:18:15 so just quicker reviews to land everything in one group 17:18:45 I understand what you are saying 17:19:30 actually, that sounds like a good exercise for a vsprint, with lots of ci and infra involvement 17:19:56 or at least a focus hours during/after an infra meeting 17:21:39 honestly, we got quite a bit done prior the virtual sprint, so I think we should do that by keeping reviews & testing active 17:21:54 asselin, your call, let us know how we can help 17:22:23 asselin: any idea why this one did not merge? https://review.openstack.org/#/c/199737/ 17:22:36 * asselin looks 17:23:04 might need a re-nudge, maybe gerrit had problems at the time 17:23:21 yes, seems like it 17:25:11 oh I see....it's depends-on is still in review 17:25:55 asselin: good catch 17:26:42 anything else for common ci? 17:26:45 i've nothing else 17:27:09 krtaylor: proceed 17:27:10 #topic Spec to have infra host monitoring dashboard 17:27:42 so this was moving well, no major problems 17:28:02 I learned about another dashboard 17:28:29 #link https://review.openstack.org/#/c/194437/ 17:28:32 jogo wrote lastcomment 17:28:44 thanks mmedvede 17:28:50 #link http://jogo.github.io/lastcomment/ 17:29:23 krtaylor: sorry to inject it here, did you look into having nagios + nagstamon (as a desktop app instead of dashboard) ? 17:30:20 wznoinsk, no, although I'm not sure its a bad idea 17:30:47 but it would need to be a service that infra would host 17:31:12 that way, it would be available, else it would be dependent on someone privately hosting it 17:31:13 krtaylor: jhesketh suggested to rename the spec files to avoid confusion, I do not think it has been addressed 17:31:22 nagstamon is just an app on your workstation you have it in your systray that poll nagios server (over http) for any alerts nagios server is seeing 17:32:04 mmedvede, I changed the topic, I didn't agree :) Also, we are moving the original spec...eventually 17:32:06 krtaylor: anyways, we can take it offline, I've got some experience with that and really like that (over dashboards or emails) 17:32:13 sweston, are you around? 17:33:10 krtaylor: yes, sir 17:33:32 wznoinsk, it would mean that someone would have to install that to see the history of a system that just posted a failed comment 17:33:34 reading backlog 17:34:01 wznoinsk, I'd think that a page would be easier for a dev to hit to see if a system was off in the weeds 17:34:24 krtaylor: nagstamon is just a desktop version of what you normally see on nagios dashboard(s) 17:34:50 you can use nagios webpages for 'non-infra' (devs) 17:34:58 sweston, thanks for joining us! I had pinged you yesterday, was wondering if you had a chance to see if a patch could change projects in gerrit 17:35:20 * asselin will be back in a few 17:35:33 krtaylor: I tried to keep lastcomment as simple as possible, it is a tiny python requests script that runs from a cron job right now 17:35:42 krtaylor: no, unfortunately I have not had any spare time at all. I might be able to get to it later in the week 17:35:58 wznoinsk, I am certainly open to suggestions, I know that others are using nagios (we are too) 17:36:31 krtaylor: actually, I am getting ready to upgrade my systems again, so today or tomorrow would be a good time to test this 17:36:37 sweston, thanks, let me know if you can't, but it woul dbe good to keep all the history and comments with the original spec 17:36:50 krtaylor: let's talk some other time as it's go-home time for me already, I'll catch you on #openstack-infra if you don't mind 17:36:52 wznoinsk: nagios is generally good at monitoring multiple hosts, I am confused how it can be used to show status of third-party CIs 17:37:06 sweston, else, we can capture the test and include it with a txt file when it is moved to third-party-ci-tools 17:37:38 jogo, thanks, I do really like the layout, clean and simple 17:37:47 krtaylor: yes, we can consider that as a last pass option 17:37:54 mmedvede: nagios is powerfull you can use any script program (bash, python, perl, java or whatever you pick) to check 'a thing' for you and feed status back to nagios 17:38:41 wznoinsk, thanks for coming, and let me know what you are thinking, if it provides more with its framework, it may be useful for other tasks in infra as well 17:38:57 mmedvede: krtaylor: I'm a huge fan of nagios so I'll sound like everything is doable in nagios, because it is in my opinion :-) 17:39:10 wznoinsk: I see, so you basically suggest not to write a web frontend, but reuse nagios 17:39:16 krtaylor: sure, I'll catch you in the week 17:39:22 wznoinsk, it is a good tool, and widely used 17:39:30 wznoinsk, thanks 17:39:44 mmedvede: web frontend is probably not the strongest part of nagios especially for wider (than just infra) public so it may still be needed 17:39:59 but the infra part should be well covered by nagios out of the box 17:40:13 * krtaylor is trying to keep up with the different threads 17:40:24 wznoinsk: ok, now I am more confused. Would definitely want to learn more about what you suggest :) 17:40:48 wznoinsk: I think we might be talking about monitoring different things 17:40:57 sweston, I'll ping you tomorrow and see where you are at, I think it would help speed the infra hosting spec if it were moved 17:41:19 krtaylor: and the code is fairly compact too 17:41:21 * asselin returns 17:41:26 krtaylor agreed 17:42:21 jogo, it would be ideal if we could combine efforts with patrickeast and have a super simple dashboard 17:42:59 jogo, patrickeast - have either of you compared with the other dashboard? 17:43:21 mmedvede: I monitor my 3rdparty CI with a script under nagios, not sure what exactly you want to monitor, could you ping me the spec pls? 17:43:39 yea we’ve chatted a bit about them 17:44:13 wznoinsk: this is the spec in discussion: https://review.openstack.org/#/c/194437/ 17:44:31 * krtaylor has not had the time to do a functional comparison of dashboards 17:44:32 i think one issue is that they are kind of targeting different audiences, so they prioritize different features/designs 17:44:56 mine is more focused on someone who wants to troubleshoot a ci system and know what/where things broke 17:45:05 patrickeast: yeah, it would be possible to make two views. I think the big difference is how we collect data 17:45:21 wznoinsk: this is patrickeast 's dashboard #link http://ec2-54-67-102-119.us-west-1.compute.amazonaws.com:5000/?project=openstack%2Fnova&user=&timeframe=24&start=&end= 17:45:29 I just use the gerrit REST API periodically, patrickeast wanted real time data so gerrit stream ... which makes things a lot more complex 17:45:47 I am happy to see any solution that is agreed upon 17:45:58 I would be more then happy to stop running lastcomment 17:46:12 haha, same boat here, i’m ok either way too 17:46:20 i just want *something* 17:46:37 mmedvede: yes, I like patrickeast's work, even if that dashboard saw day light for the first time (and it was more basic) 17:46:45 jogo, patrickeast thanks for your flexibility 17:47:42 jogo, as patrickeast said, yours is useful too, I don't see why you'd have to stop running it, unless you wanted to 17:47:53 I do understand there will be different criteria you'd be scoring the CIs on (i.e.: how many times per 10 a CI failed, how many times per 10 a 3rdparty CI failed when upstream jenkins DID NOT etc.) 17:47:55 krtaylor: +1 17:48:06 but having everyone jump in on maintaining one dashboard and improving it would be MUCH better 17:48:20 jogo: i think yours is actually more useful for someone who just wants to know what systems are ok for a project at a glance 17:48:32 its just harder to figure out *why* they are broken 17:48:42 but that doesn’t matter to 99% of the openstack devs 17:48:44 krtaylor: +1 . I would prefer this as well, and would rather have folks contributing to radar 17:48:59 mmedvede: with more criteria and logic you want a more advanced dashboard, or easy to read dashboards 'per problem' 17:49:21 sweston has a good point 17:49:40 I think the most important thing is that we all agree, very quickly, on one dashboard to get the hosting spec done 17:49:43 patrickeast: right, that is exactly what I head in mind when I put it together 17:49:51 these others are supposed to be 'temporary' 17:50:03 then we can start working on the full featured solution 17:50:03 wznoinsk: I want something I can use to detect anomalies in our CI compared to others. patrickeast 's scoreboard is currently sufficient for our usecase 17:50:28 patrickeast: it wouldn't be too hard to add an option to my dashboard to list failed cases when you click something 17:50:39 wznoinsk: I do not necessarily want 'automagic' detection, I just want data presented in a consumable way 17:51:18 ok so do we have agreement to stay with scoreboard for now? else we prob have to wait 2 weeks to see any movement 17:52:28 jogo, would you be willing to move yours to third-party-ci-tools so others could contribute through the gerrit process? 17:52:43 krtaylor: no problem 17:53:09 If we would ask infra to deploy scoreboard or lastcomment, where would puppet modules go 17:53:11 ? 17:53:36 mmedvede, its in the spec (roughly) 17:53:48 krtaylor: yes, sorry 17:53:55 krtaylor: I don't think it makes sense to have infra host a temporary solution 17:54:19 jogo, actually, that was their suggestion 17:54:25 in the time we are sorting out a 'temporary' thing a final thing could be done 17:54:47 they wanted something now, and it would get all the structure in place 17:55:18 thats really the catch here, we need a basic solution NOW 17:55:27 krtaylor: +1 for NOW 17:56:32 ok, well, we are running out of time for the meeting, move to email thread? or a quick decision? 17:56:37 mmedvede: I'm affraid that to check dashboard for all projects, all CIs in each project, scanning it once/a few times a day is taking a lot of human cycles, I'd prefer to have a check and threshold defined that lets me know 17:57:10 wznoinsk, it would really only be visited when a dev got a neg comment 17:57:21 and it would be filtered by project 17:57:36 just meant to see if a system is off in the weeds or not 17:58:07 and for that reason, I'd choose patrickeast 's if I had to pick one today 17:58:34 krtaylor: yes, it's good for ad hoc checks 17:58:37 its easy to compare a systems results to everyone else that tested the same patch 17:58:52 so do we agree? 17:59:53 +1 17:59:53 out of time 18:00:08 I'll move to email thread 18:00:19 thanks everyone, really good meeting 18:00:33 #endmeeting