15:00:03 #startmeeting third-party 15:00:04 Meeting started Mon Mar 7 15:00:03 2016 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:08 The meeting name has been set to 'third_party' 15:00:11 hello 15:00:35 hi anteaya 15:00:43 mmedvede: how are you today? 15:01:29 anteaya: good 15:01:34 glad to hear it 15:01:59 so I'll post a reminder to folks that openstack meetings use utc time on purpose 15:02:15 so if time changes for you next weekend, refer to utc time for meetings 15:02:18 o/ 15:02:22 hey asselin 15:02:31 do we have anything to discuss today? 15:03:02 that was a question for either of you 15:03:05 not just asselin 15:03:13 there is a thread on third-party-announce with a CI reporting merges 15:03:19 yes 15:03:30 just wanted to note that it could be valid sometimes to report a failure to merge 15:03:32 the ci operator posted a reply today 15:03:41 mmedvede: can you share the usecase? 15:03:57 in this instance it was indicative of a problem on their end 15:04:01 e.g. if infra jenkins did not fail to merge, but third-party CI did 15:04:14 why is that important for a dev to know 15:04:24 what action should the dev take in that case? 15:04:24 if third-party CI does not report anything in that case, than it would not be clear what happened 15:04:42 let's look at it from the devs point of view 15:04:48 what should they do in that case? 15:05:07 mmedvede, that's why the suggestion is to e-mail the operator. 15:05:13 in that case they would know that merge failed, and could recheck accordingly 15:05:30 why would they recheck if jenkins was able to merge the patch? 15:05:32 I don't see any reason it should fail merge differently by 3rd party ci and infra 15:06:15 asselin: it could be different, because there is often time difference between when infra zuul merger runs, and third-party CI does 15:06:37 but in the case of a difference, what is the dev to do? 15:06:46 if jenkins merges? 15:07:15 anteaya: dev would know that third-party CI failed. There could be case they expect a +1 from third-party CI 15:07:23 infra isn't going to counsel people to rebase based on what a third party ci reports 15:07:58 I think the issue is that most 3rd party ci merge failures are not legitimate 15:08:13 well, if TPCI fails, it would also indicate that infra zuul would most likely fail on next recheck or during gate 15:08:23 they are caused by e.g. network failures 15:08:40 asselin: +1 on most being not legitimate. 15:08:59 I am trying to make a case that in rare cases it could be useful. 15:09:06 if there is something the dev should know then as asselin says, the operator is emailed and can post a comment on a patch 15:09:16 informing the dev of the situation 15:09:18 Not saying that it should be encouraged for third-party CI to report merge failures all the time 15:09:29 sure 15:09:42 but that is where human intervention is necessary 15:09:48 that the operator read their email 15:09:55 and provide a comment on a patch 15:10:06 as a human, not a third party ci system 15:10:11 that is most welcome 15:10:20 and highly appropriate 15:10:47 does that make sense? 15:11:04 and thanks asselin for taking the time to reply to that thread 15:11:59 did we want to say more on this topic? 15:12:08 mmedvede: thanks for bringing it up 15:12:21 nothing more here, thanks for discussion 15:12:36 thank you 15:12:46 so in this reply http://lists.openstack.org/pipermail/third-party-announce/2016-March/000295.html 15:13:02 the third party operator is missing the importance of recieving the email 15:13:11 would either of you like to reply to that post? 15:13:17 I can if you don't want to 15:13:31 was just thinking it would be nice to hear from other operators 15:13:57 right now their configuration has email going to a dummy account 15:13:58 I read it but wasn't sure how to reply more than restating my original e-mail 15:14:10 asselin: okay fair enough, thank you 15:14:23 mmedvede: would you like to reply? or shall I? 15:14:52 I am not sure what to reply there 15:14:59 okay that is fine 15:15:25 the fact they have email going to to: third_party_ci at example.com was what I was going to address 15:15:27 I can just say that the config he is asking about is the one he should be using, I guess 15:15:31 (with the email) 15:15:35 as I doubt that is their email address 15:15:58 I'll reply 15:16:03 thanks for taking a look 15:16:10 didn't they use those emails as examples? 15:16:19 I don't think so 15:16:31 I think that is what they have in their config file 15:16:45 so that is what I want to address 15:17:13 what we just discussed, read the email from your system and reply as a human on patches where it makes sense for the developer 15:18:15 so thanks for having the discussion with me 15:18:19 so I know how to reply 15:18:21 :) 15:18:32 is there anything more to be said in this topic? 15:18:52 nothing except I now fear our CI would start reporting merge failures too :) 15:19:01 need to make sure it does not 15:19:03 mmedvede: has it in the past? 15:19:06 yes 15:19:07 mmedvede: ah thank you 15:19:22 well I did try to find them online 15:19:30 and they weren't around 15:19:34 and a dev complained 15:19:38 mmedvede, I thought you were the one that originally suggested that :) 15:19:43 as they had 4 systems doing that 15:20:11 so four systems spamming patches was a little noisy 15:21:00 asselin: could be me who suggested it exactly because our CI was doing that 15:21:11 but that was awhile back 15:21:26 good solutions stand the test of time 15:21:55 so thank you for your solution mmedvede :) 15:22:12 do we have other topics we would like to discuss today? 15:22:32 I do not want to take credit, possibly was not me. My memory fails me on that 15:22:47 I just know it wasn't me 15:22:51 hehe 15:22:53 thanks to whomever it was 15:23:05 and I'm fine if it was or was not mmedvede 15:23:15 I have one topic - wondered if anyone else has the same problem - zuul memory leak 15:23:26 mmedvede: yes infra has the same problem 15:24:02 ok, I am thinking of implementing a workaround - periodic restart of zuul service 15:24:19 mmedvede: hmmmmm, we are trying to collect more data on the memory leak 15:24:31 we aren't fans of restarting for memory leak issues 15:24:57 that is sort of our last stand when trying to find the real solution 15:24:59 i believe jhesketh has been looking into the problem 15:25:37 restart is last resort, but guaranteed to work :) 15:25:42 jeblair: what is the current theory of the cause? 15:25:51 or is there a current theory? 15:25:56 mmedvede: true 15:26:28 last time for me it took zuul about 3 days to consume 95% of 8GB ram, and halt the system 15:26:40 mmedvede: :( 15:26:46 mmedvede: that makes us sad 15:27:15 i don't see an outstanding patch, so the best thing to do may be to continue discussion on the mailing list thread 15:27:19 mmedvede: so for you is the memory leak consuming memory faster than before? 15:27:33 it's possible jhesketh thought it was fixed and is unaware that it is not. 15:27:35 * anteaya looks for the mailing list thread 15:27:41 anteaya: I did not see a pattern, sometimes it took a week/2 weeks 15:27:47 last time was fast 15:27:49 jeblair: I don't think it is fixed for infra 15:27:55 anteaya: i agree 15:28:31 #link http://lists.openstack.org/pipermail/openstack-infra/2016-February/003722.html 15:28:37 anteaya: while i think we restarted after jheskeths most recent fix, i am not positive. that should be verified before we solidify that conclusion 15:28:44 mmedvede: so this is the start of the zuul memory leak thread 15:28:52 mmedvede: please add your experience 15:29:18 anteaya: ok, thanks for the link 15:29:20 jeblair: I think the rename zuul restart incorporated jhesketh's lates change 15:29:29 but again yes, confirmation would be good here 15:30:08 I also need to update to latest zuul. But I did not see any patches landed that made me think there was another fix attempt 15:30:09 mmedvede: and thank you for sharing your experience, hopefully with more folks running zuul's we can close in on a solution 15:30:56 #link http://git.openstack.org/cgit/openstack-infra/zuul/commit/?id=90b61dbde89402971411a63f7596719db63f6155 15:31:35 that merged Feb. 3rd 15:32:05 last rename was Feb. 12th: https://etherpad.openstack.org/p/repo-renaming-2016-02-12 15:32:14 yes, that is the last one I remember. Our zuul uses that patch 15:32:31 jeblair: based on those dates I'm going with zuul is running jhesketh's latest memory leak solution attempt 15:32:37 mmedvede: okay thanks 15:33:02 mmedvede: and you still have zuul using up memory within 3 days and ceasing to run 15:33:04 yes? 15:33:36 anteaya: no, it is not always the same. Last one was about 3 days. I assume it depends on patch volume 15:33:44 right 15:34:05 but I'm confirming that the last time you had an issue your zuul was running jhesketh's patch 15:34:23 is that accurate? 15:35:33 yes. For the record, zuul ui is reporting 'Zuul version: 2.1.1.dev123' 15:35:39 mmedvede: thank you 15:36:17 so please include your experience on the mailing list thread 15:36:38 anteaya: ok 15:36:44 it is possible, as I believe jeblair stated above, that jhesketh belives the memory leak is fixed 15:36:52 yet you have data that this is not the case 15:37:26 if you could do a git log on your zuul version and confirm the presence of http://git.openstack.org/cgit/openstack-infra/zuul/commit/?id=90b61dbde89402971411a63f7596719db63f6155 15:37:30 that would be awesome 15:37:39 that way we aren't guessing 15:37:49 thanks for bringing up the topic, mmedvede 15:38:08 do we have any more discussion on this topic? 15:38:38 nothing from me 15:38:47 do we have any other topic we would like to discuss? 15:38:53 thanks mmedvede 15:39:18 are there any objections to me closing the meeting? 15:39:38 thanks everyone for your kind attendance and participation today 15:39:47 \o 15:39:54 check utc times for meetings next week if your time changes 15:40:02 kcalman: have you an item you wish to discuss? 15:40:18 No, I m good 15:40:22 okay thank you 15:40:24 welcome 15:40:38 thanks 15:40:42 I look forward to seeing everyone next week 15:40:45 thank you 15:40:48 #endmeeting