18:00:17 #startmeeting third-party 18:00:18 Meeting started Mon Sep 8 18:00:17 2014 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:21 The meeting name has been set to 'third_party' 18:00:34 hello 18:00:37 anyone here? 18:00:42 hiya 18:00:47 Hello ! 18:00:52 great 18:00:55 welcome 18:01:01 #topic Welcome & Reminder of OpenStack Mission 18:01:01 o/ 18:01:12 o/ 18:01:14 o/ 18:01:20 #info The OpenStack Open Source Cloud Mission: to produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private clouds regardless of size, by being simple to implement and massively scalable. 18:01:35 our mission just to keep us on course 18:01:52 hello all! 18:01:57 welcome to anyone who is here for the first time 18:02:05 and hello to all yon regular folks 18:02:10 #topic Review of previous week's open action items 18:02:23 #info third party team to review https://review.openstack.org/#/c/99990/ 18:02:28 so this merged 18:02:39 and we had some reviews from some third party people 18:02:41 thank you 18:02:57 ociuhandu and daya_k thanks for your reviews 18:03:01 did I miss anyone? 18:03:43 so that spec is all about creating individual puppet modules so that they are more easy for third party to consume 18:03:53 so we will follow progress on that 18:04:04 and folks are welcome to help out (read: encouraged) 18:04:10 #topic Announcements 18:04:22 so if we go here 18:04:26 #link http://lists.openstack.org/cgi-bin/mailman/listinfo 18:04:33 here are all the lists for openstack 18:04:44 near the bottom of the page you will see two new ones 18:04:58 third-party-announce which you all should be subscribed to 18:05:19 since this is the location we make announcements, like if your system is disabled and how to get it re-enabled 18:05:29 and the other is third-party-requests 18:05:35 note the name change 18:05:44 it was originally third-party-request 18:06:00 but we had to change the name since -request is a reserved word for mail servers 18:06:08 any questions here? 18:06:20 anteaya, For help in the third party, which ml should we use 18:06:28 lyxus: good question 18:06:45 -dev or -infra for now 18:06:50 for questions about how to operate a specific tool, the -infra list is the best place 18:07:00 since that was the point of -infra in the first place 18:07:10 third-party-announce is meant to be low traffic 18:07:24 perfect ! Wanted to double check ! 18:07:38 sure 18:07:43 anything else here? 18:08:11 moving on 18:08:17 #topic OpenStack Program items 18:08:29 there are no agenda items here 18:08:40 anyone representing a program with any items? 18:08:57 moving on 18:09:01 #topic Deadlines & Deprecations 18:09:24 well since we are after feature freeze, most deadlines have passed I do believe 18:09:44 and anything that is going to be deprecated is being deprecated 18:10:02 so I don't expect much here until we get into K-1 at the very least 18:10:16 so we can move on, unless anyone needs to speak up 18:10:25 Just an update from Cinder. 18:10:29 jungleboyj: go 18:11:09 We are working to get everyone to stabilize their CI processes. Duncan is going to propose patches to remove anyone who hasn't at least started the process of getting a third-party job up and running. 18:11:21 o/ 18:11:32 jungleboyj: good to know 18:11:38 Since the deadline has long passed. 18:11:42 jungleboyj: yes 18:11:49 jungleboyj: anything else? 18:12:10 Nope, that was all I had. Thanks anteaya 18:12:25 jungleboyj: thanks for the update 18:12:29 anything else here? 18:12:37 moving on 18:12:38 jungleboyj: +1 18:12:44 #topic Highlighting a Program or Gerrit Account 18:12:55 #info CI validation process (emagana) 18:12:59 emagana: your floor 18:13:18 sure anteaya 18:13:46 just a brief summary. Neutron has a very long already list of plugins and drivers 18:14:06 #link https://wiki.openstack.org/wiki/Neutron_Plugins_and_Drivers#Existing_Plugin_and_Drivers 18:14:32 They were providing "results" and testing commits in a very heterogenous way 18:14:54 So, I proposed an audit a few weeks back based on positive and negative commits: 18:15:01 https://review.openstack.org/#/c/114393/ 18:15:03 can we have a definition of your use of heterogenous for the non-english speakers in the crowd? 18:15:07 https://review.openstack.org/#/c/114629/ 18:15:19 anteaya: Absolutely! 18:15:26 #link https://review.openstack.org/#/c/114393/ 18:15:43 #link https://review.openstack.org/#/c/114629/ 18:16:02 anteaya: having no uniform standards. 18:16:12 great thank you 18:16:26 By heterogenous I mean that those CI systems were not following the same rules for triggering their tests and they were not posting results on the same format (or the same logs) 18:16:39 great 18:16:50 finally, some of them were not even testing realistic environments (this is why I included the negative tests) 18:17:36 the outcome was really good, we got the following enhacements: 18:18:14 1) Neutron team establish minimal CI testing requirements for Juno release 18:19:05 2) 60%~ of Neutron CIs improved their testing and convert their testing in a more homogenous path 18:19:48 3) Clear view of which plugins/drivers should be deprecated 18:20:04 4) Better CI community after all! 18:20:15 yay! 18:20:21 great work here emagana 18:20:48 your two patches asking ci systems to calibrate against is a great direction 18:20:53 well done 18:20:55 5) Neutron team, will spend some time during Kilo summit to stablish hard requirements for the next release 18:21:20 great, hopefully that session won't conflict with other things I can sit in and listen 18:21:22 So, I encourage all other project to do the same 18:21:28 awesome 18:21:30 Open Actions: 18:21:48 How to make this "audit" an automated process ( I really invested a lot of time) 18:21:56 neutron feedback from Juno - the standards are great, but i think we should consider: 1) not changing them 3 times within one cycle (yes, neutron really did), and 2) providing a 3-4 week on-boarding time, when you can be live but not commenting, so you can get things stable without spamming the community with false negatives. right now the review rules are 18:21:57 so rigid that you either submit your code 2 months early, or you push an immature CI on everyone. 18:22:36 emagana: I am really grateful for the time you spent on this 18:22:51 emagana: and yes, having some sort of automated mechnism would be great 18:23:19 dougwig: feedback well taken, this is why I have requested already to Neutron PTL some summit time to discuss the standards and do NOT change them during the release cycle 18:23:34 emagana: can you add an item to tomorrow's infra meeting agenda, to show them these patches and get it on their radar too so that they can start thinking about automation as well? 18:24:00 anteaya: I can but I wont be able to attend tomorrow's meeting. I have a full training day 18:24:15 emagana: ah okay next tuesday perhaps? 18:24:23 anteaya: absolutely! 18:24:26 thanks 18:24:39 emagana: thank you, this is wonderful work! 18:24:53 dougwig: I hear you, and this is part of the growing pains we are going through 18:25:17 sweston: Thanks! I want to thank for CI owners for supporting this effort and not getting mad at me!! :-) 18:25:24 since we are all trying to find the best path through with no successful role model we make mistakes 18:25:43 dougwig: so having your feedback, hopefully at the summit will be helpful, will you be at summit? 18:26:13 anteaya: yes, i wil 18:26:17 i will 18:26:23 dougwig: great 18:26:24 emagana: :-). The topic of automation is not new, if you ping me after your training, later in the week, I would be happy to discuss a few of my thoughts with you 18:26:39 sweston: sounds good! I will do that! 18:26:49 sweston, lets put it on the agenda for next week 18:26:56 and then hopefully the two of you can share them will the rest of us 18:27:14 Another question is how much time we have before fixing a problem when our CI gets broken. In a small team all members can take vacations at the same timing. 18:27:27 sweston, emagana - good for a summary next week? 18:27:39 depends on what is breaking and how it is affecting others 18:27:41 krtaylor: Sure! 18:27:43 what is the best practice? One idea is to disable our CI to avoid any problem. 18:27:52 krtaylor, anteaya: yes, will do. 18:28:00 amotoki_: to disable when you all are on vacation, you mean? 18:28:03 amotoki, we always try to have someone "on call" 18:28:32 anteaya: yes :-( 18:28:46 amotoki, its part of having a public facing service for developers 18:29:07 amotoki_: well at the very least tell me when you all are going away then if there are problems, we will just disable, not wait to hear from you 18:29:10 Now! I want to add something.. A rumor has it that All vendor-specific plugins and drivers may get removed from the main tree!!!! 18:29:15 krtaylor: Is it a requirement for third party CI? If so, it should be documented clearly. 18:29:24 amotoki_: if you want to disable as a preventative measure that is up to you 18:29:28 emagana: It's under discussion, yes, it's on the table. 18:29:45 amotoki, it is a requirement to be able to turn off your system if it is broken 18:29:53 perhaps it is also worth discussed. 18:30:03 mestery, can you clarify what it is ? 18:30:03 amotoki_: how would we be able to know if someone is on vacation or not? 18:30:14 amotoki_: we can't make that a requirement 18:30:21 lyxus: This is not the right forum for that 18:30:24 a working system is a requirement 18:30:32 lyxus: Yes: To ease review burden and make it easier for plugin/driver owners to iterate, we're considering moving them out of the main repo. 18:30:38 This is similar to what Nova is considering for hypervisor drivers. 18:30:42 lyxus: Let's discuss on neutron channel for during our weekly meeting 18:30:49 anteaya: it is a good question. Taking a vacation leads to less response... just an example. 18:31:04 mestery, emagana OK ! 18:31:11 amotoki_: right, so at the very least if noone is going to be maintaining your ci, let me know 18:31:20 and how long you will be away 18:31:20 anteaya: amotoki_ it seems to me like we need to better define a requirement around response time when a ci systems fails 18:31:36 jesusaurus: well then that limits me 18:31:52 sometimes I will ask and wait for a response 18:31:58 that would help 18:32:02 and sometimes, I just have the system disabled 18:32:20 it depends on the severity of the issue and what else is happening 18:32:30 it has always been a "reasonable" amount of time 18:32:39 the otehr day I made an effort to get a hold of a brocade ci 18:32:39 anteaya: can you clearly define when you do which? 18:32:46 waited and waited 18:32:53 hehheh 18:32:54 then they just told me shut it off 18:32:57 so I did 18:33:10 no, because I don't clearly define it myself 18:33:17 it depends on how much time I have 18:33:21 okay, that was my question 18:33:23 what else is going on 18:33:27 anteaya: I think some limits should to be defined, per “severity level” (also defining the severity levels) 18:33:31 look at the backscroll in -infra 18:33:44 citrix xenserver is failing builds 18:33:46 anteaya: and also help in justyfing resource request sometimes 18:33:50 and bobball wasn't in irc 18:33:56 so I shut them off 18:34:13 it also has a lot to do with my knowledge of the system 18:34:24 as if all members are in a certain time-zone, it’s way harder to provide timely responses during the “night” 18:34:29 if bob isn't in channel when I ping him, and his nick has _AWOL at the end 18:34:36 waiting for him is a waste of my time 18:35:14 in any of my actions 18:35:25 does anyone feel I have treated their system unfairly? 18:35:36 have I failed to communicate? 18:35:49 failed to reply or respond in a timely fashion? 18:35:56 could that be considered as a candidate for automation? at the very least, if a system is outside the parameters we define, we should at least stop that system from gating. 18:36:01 anteaya: for sure not, but these requirements should still be documented :) 18:36:08 ociuhandu: sure 18:36:11 they are 18:36:44 anteaya: saying that the CI has to have 24x7 or 24x5 or … response time on irc, for instance? 18:36:51 #info The OpenStack Infrastructure team disables mis-behaving third-party ci accounts at its discretion. 18:37:01 #link http://ci.openstack.org/third_party.html 18:37:19 why would I want to police that? 18:37:38 really good point 18:37:50 every requirement is one more obligation that *I* have to carry around 18:38:00 use good judgement 18:38:03 I always assumed that it was up to the CI system to police itself 18:38:07 comminicate changes 18:38:17 krtaylor: that is the best case scenario 18:38:25 a CI system should know when it is having problems 18:38:37 wouldn't that be nice 18:38:48 hehheh, well, it IS a goal 18:38:53 and a good one 18:39:11 we are looking at different service monitors for notification 18:39:27 every system should be, maybe a good place for a communty effort 18:39:33 and a community effort 18:39:38 if I ever use poor judgement or you feel I am treating your ci system unfairly, be sure to bring your perspective to my attention 18:39:43 so that I can do a better job 18:39:55 my service monitor is to get an email on every job, and to go inspect the results. it sucks. :) 18:40:21 dougwig: at least you are paying attention 18:40:22 anteaya: I don't think your judgement has ever been an issue :-) 18:40:26 full marks from me 18:40:29 sweston: thanks 18:40:40 do we have more here, or shall we move on? 18:41:04 let's move on 18:41:09 anteaya: +1 18:41:11 I would like to stay with this topic 18:41:14 for a moment 18:41:23 and draw your attention to a ml thread 18:41:39 #link http://lists.openstack.org/pipermail/openstack-dev/2014-September/045137.html 18:42:05 I would like to heap kudos on the cinder team and particularly DuncanT and asselin__ for their participation in this thread 18:42:28 this is what it looks like when a new ci account requests permission to start voting on a repo 18:42:34 glad to help 18:43:00 and what it looks like when experienced members in this program weigh in giving the benifit of their experience to the new operator 18:43:05 this is awesome 18:43:20 actually, the e-mail exposed a question I have for others: how do you measure stability? 18:43:26 and exactly the direction that I had hoped requesting permissions would take 18:43:35 asselin__: good question 18:43:45 asselin__: you had a good reply in the thread 18:43:53 for starters some history 18:44:03 two weeks of life does not a stable history make 18:44:06 really/ I think I missed it.... 18:44:11 you said it 18:44:32 I don't see a "stable history" 18:44:57 so some comments over a period of time 18:45:30 plus if cinder wants to use emagana's idea of offering patches to ensure postive and negative test results 18:45:44 and have them test that patch on some sort of schedule 18:45:48 that would help 18:46:01 asselin__: does that help answer your question? 18:46:17 I like the idea of the test patches, but what about intermittent issues on the driver/plugin side? 18:46:21 stability: at least for nova, it was up to the community on whether a system was stable or not, whether they could vote or not 18:46:30 asselin__: then do they communicate? 18:46:48 asselin__: do they tell people, my ci is failing to build, I have taken it offline 18:47:00 krtaylor, in cinder, we're not supposed to publish any comments until we're "stable" 18:47:08 asselin__: do they update their wikipage entry in the thirdpartysystems page to say they are offline 18:47:29 asselin__: they can on the sandbox repo 18:47:43 asselin__: that is what the sandbox repo is meant for 18:48:23 so they need to pass X sandbox tests successfully, and then they're stable? 18:48:29 mostly it comes down to, do they care and are they making good choices 18:48:40 stable enough to comment on real cinder patches 18:48:44 then we can disagree about the value of X 18:48:48 asselin__, yes, that frames the stability discussion, but I don't know if running a simple test every few hours really shows that a system is doing the right thing, but I don't have a better idea at the moment 18:49:24 asselin__: well the can't comment on cinder until they are stable is coming from cinder, the part that infra wants to see is not voting until they are stable 18:49:37 in cinder, we have clearly defined what tests need to run. 18:49:42 great 18:50:01 so on the sandbox repo, you can see if they have logs for those tests, can you not? 18:50:08 ok, cinder ptl doesn't want commenting until stable either, this includes non-voting. 18:50:22 for commenting, stability is one thing, for voting it should be a higher bar 18:50:24 okay 18:50:38 that is cinder's perogative 18:50:48 krtaylor: +1 18:50:54 so the sandbox repo is where I would say you do your evaluation 18:51:01 asselin__: would that work? 18:51:05 right, I guess other projects allow commenting, even if unstable, so they can see it directly. For us, we should e.g. use sandbox to prove stablity of laxt X patches. 18:51:16 yes, I think that can work. 18:51:20 that is the direction I would suggest yes 18:51:24 great 18:51:29 let us know how that goes 18:51:32 krtaylor: which is why I think that it would be easier to allow automatic disabling, for turning off voting systems 18:51:34 sure 18:51:39 thanks 18:51:45 let's move to open discussion 18:51:58 #topic Open Discussion 18:52:01 okay here we are 18:52:09 one thing I did want to share with you 18:52:31 at a certain point I had wanted to have a conference type event focused on automated testing 18:52:45 currently I am unable to pursue this direction 18:52:59 some of you may have been aware that I was moving towards organizing this 18:53:07 so I thought I should tell you I can't anymore 18:53:16 if someone else wants to that is fine 18:53:24 or something may happen in future 18:53:34 but I can't organize anything for kilo 18:53:40 oh, wow 18:53:46 any thing else for open discussion 18:53:49 I have some concerns about having one CI account for multiple plugins to be tested 18:53:58 dane_leblanc: please expand 18:54:01 We have 6 different things to test and 1 account 18:54:06 So for example 18:54:16 (1) Marking CI as ‘down’ on 3rd party wiki page – all or none 18:54:31 If one testbed is failing, we have to mark all as down 18:54:39 hmmmm 18:54:39 (2) Voting rights per account – all or none 18:54:50 yes, no way around that 18:54:51 (3) One gerrit review comment per CI… 18:55:06 ah, yes that is a requirement 18:55:07 (4) Single point of contact – hard to scale, slower response 18:55:24 dane_leblanc, what different things? 18:55:24 depending on the company, but yes I can see that point 18:55:41 dane_leblanc: more points to add? 18:55:41 The different things are listed here: 18:55:43 https://wiki.openstack.org/wiki/ThirdPartySystems/Cisco_CI 18:56:00 (5) Account naming, difficult to have a descriptive account name for everythig 18:56:14 dane_leblanc: yes, account naming is a dragon 18:56:22 dane_leblanc, at IBM as you can imagine, we have many different test systems, all with their own gerrit service id 18:56:36 dane_leblanc: was there a solution you wanted to present? 18:56:40 (6) Latency on one testbed can slow down responses for other testbeds 18:56:49 that I didn't know 18:57:22 3 minutes left 18:57:29 No soution, except maybe allowing multiple accounts, or improving the granularity e.g for marking systems down and voting rights 18:57:32 dane_leblanc, multiple systems/acounts are the answer it soulds like 18:57:37 I'm hoping to get to dane_leblanc's thoughts on the way forward 18:57:47 Scaling is nuts for us right now. 18:58:01 dane_leblanc: well infra doesn't limit the number of accounts from one company 18:58:06 I didn't know there was a limit on number of accounts. 18:58:16 dane_leblanc: are you wanting a separate account for each plugin you test? 18:58:16 I was told we could only have one 18:58:20 we just encourage people to consider consolodating if they can 18:58:32 dane_leblanc: do you recall who told you that? 18:58:37 Clark 18:58:42 dane_leblanc: because I don't remember ever saying that 18:58:44 I hope not, we (IBM) would be in big trouble 18:58:48 clarkb: have a minute? 18:59:01 sure? 18:59:03 thanks 18:59:17 any objection to cisco having more than one gerrit ci account? 18:59:55 no, I think we have given up asking companies to coordinate on their end. its just easier for everyone to have multiple accounts 19:00:00 thanks 19:00:01 no objection 19:00:05 thanks dane_leblanc 19:00:14 Thank you, anteaya! 19:00:20 please send requests to third-party-requests mailing list 19:00:22 :D 19:00:24 and we are at time 19:00:26 :) 19:00:30 it'll make the new gerrit CI display stuff look less nice, but given the problems they're having, that's not a deal killer. 19:00:31 thanks everyone for a great meeting 19:00:35 see you next week 19:00:38 bye 19:00:41 bye all 19:00:43 #endmeeting