22:01:31 #startmeeting zuul 22:01:32 Meeting started Mon Oct 23 22:01:31 2017 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:01:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:01:36 The meeting name has been set to 'zuul' 22:02:03 there is no agenda for this meeting -- so let's start by asking if there's anything folks want to talk about :) 22:03:08 i'd like to give folks a quick heads up on some branch-related issues i've started digging into 22:03:15 point of order: zuul v3 is awesome 22:03:26 yes it is! 22:03:39 i have a topic 22:04:03 Shrews: what's that? 22:04:08 and maybe we should give the issues etherpad a quick once-over? 22:04:18 i've been working on migrating the nodepool jobs (https://review.openstack.org/#/q/topic:nodepool-migration+status:open). all of those (except the WIP) could use reviews 22:04:30 Shrews: very meta! 22:04:37 so i need some nodepool core's to get active there 22:05:15 okay, if you have other topics, let me know and we'll wing it 22:05:22 #topic zuulv3 issues 22:05:30 jeblair: it's all been very confuzzling trying to get everything in all the projects in the right order, so i sympathize with our users 22:06:09 Shrews: good! we're supposed to do that :) 22:06:51 so first thing, i guess: is that we're still in firefighting mode 22:07:00 #link https://etherpad.openstack.org/p/zuulv3-issues 22:07:10 that etherpad is active 22:07:19 if you have time to jump on issues there, please do so 22:08:08 i think most of the issues under zuul are being worked in some form or other 22:08:33 anyone have any questions about those? 22:08:46 fungi: i left a comment on your sqlreporter patch 22:08:58 thanks! 22:09:19 it may be the comment to which i already responded 22:09:23 fungi: briefly: i think we can merge that as a quick fix, but there's a slightly larger patch that we should write soon -- before we have tristanC's dashboard 22:09:30 \o sorry I'm late 22:10:07 jeblair: yep, i'm good either way and happy to work on the more correct solution, just looking for pointers from someone with a deeper knowledge of the orm bits 22:10:08 fungi: ah yeah, so i think we'll want a migration to allow nulls 22:10:55 basically, if we do it now, all we need to do is change the column; if we do it later, we'll need to 'update buildset set change=null where change=0' or something. not too big of a deal. 22:11:42 i also added a related issue to the pad 22:12:00 I guess we will need to unfreeze the v2 files to let https://review.openstack.org/#/c/507180/ merge 22:12:02 may be helpful if one reporter breaking doesn't cause zuul to skip other reporters 22:12:08 fungi: and yeah, i don't think there's any work in progress to make the reporters more idempotent. 22:13:03 i think the only trick there is what to do if gerrit fails to merge the change. if we make them idempotent, does that impact any of the subsequent reporters? 22:13:33 (currently, if gerrit fails to merge, no other reporters run) 22:13:52 my inclination would be to make them idempotent. we can't do a two-phase commit across them, so no use pretending we do. :) 22:14:03 * clarkb wanders by late 22:14:06 ahh, yeah, seems like reporters which also merge changes are a slightly different class than those which just provide data 22:14:42 fungi: is the failing proposal a zuul issue or job issue? 22:15:05 i put it under zuul since the patch will be to zuul's codebase 22:15:22 oh, wait, failing proposals 22:15:31 http://logs.openstack.org/periodic/git.openstack.org/openstack/requirements/stable/newton/propose-updates/a515bba/job-output.txt.gz 22:15:33 that link ^ 22:15:34 we've switched subtopics, sorry ;) 22:15:52 i saw that on the pad and am trying to regain my former context there 22:16:47 thats a job issue using ZUUL_REFNAME right? 22:16:54 (which is not something we'll continue to provide in v3) 22:17:16 oh, right, this issue got somehow split from the refname details 22:17:30 clarkb: i think zuul_refname is sometimes provided by the legacy filter 22:18:03 iirc the comment in the filter says it is intentionally ommitted but unsure if 100% of the time 22:18:33 clarkb: that's zuul_ref. refname should be there for non-change items. 22:18:39 oh right 22:19:02 this is the thing where we had to set refname explicitly on old v2 periodic jobs 22:19:03 so... for periodic pipeline jobs i don't think v2 provided one at all 22:19:15 fungi: correct, it was hardcoded on the job before iirc since we had a job per branch 22:19:23 and, yeah, i think this has since been solved? 22:19:35 but I want to say that was lost in the conversion process. And ya I want to say the job was updated ? 22:19:51 to just use what the checkout is rather than try and checkout in the job body itself? 22:20:05 "propose-updates" should, by virtue of not having the word legacy in it, not be using ZUUL_ vars 22:20:22 so hopefully things are as clarkb says 22:20:29 jeblair: ya its in the script it runs so the jjb bits didn't but then the script in jenkins/scripts did/does 22:20:33 yah, are they using parent: base? 22:20:35 * clarkb looks to see if that was cleaned 22:20:54 right, the issue was within the scripts it runs, which to retain backward-compat in case of a rollback (we hadn't decided no rollback yet at that point) it was retained in the script 22:21:38 # Zuul v3 adds refs/heads, remove that to get the branch 22:21:40 then 22:21:46 ZUUL_REFNAME=${ZUUL_REFNAME#refs/heads/} 22:22:42 ya I think this was addressed based on the git log 22:22:53 the job is getting past that point now, so i think it can be updated to whatever the new failure reason is or moved to the fixed list and a new issue added for the new failure mode 22:22:57 #link http://logs.openstack.org/periodic/git.openstack.org/openstack/requirements/stable/newton/propose-updates/35f37e8/job-output.txt.gz 22:23:01 that's from today's run 22:23:02 oh. it's an old job masquerading as a new one. 22:23:04 fungi: yup I agree 22:23:06 ya, propose-update jobs parents to base, we should likey change that to legacy-base if still using zuul-cloner 22:23:24 if it's not using legacy-base, how's it getting ZUUL_REFNAME? 22:23:34 jeblair: it is populatign it itself 22:24:02 from? 22:24:10 also, and perhaps related (or perhaps not?) this looks like the job to generate constraints updates, but we're not supposed to run that for any branch besides master because we freeze constraints on stable branches 22:24:28 fungi: i think AJaeger had a patch to fix that today 22:24:31 jeblair: not sure yet but where I am reading it it is passed as a cli arg to the script 22:25:01 playbooks/proposal/propose-updates.yaml: command: "{{ ansible_user_dir }}/scripts/propose_update.sh {{ update_target }} {{zuul.branch}}" 22:25:09 #link http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/scripts/propose_update.sh#n91 22:25:13 passes it in there 22:25:23 ah, ok. 22:25:30 so yeah, that should be pretty v3-safe. 22:25:44 cool 22:26:00 and now that we've decided to stop maintaining backwards compat, we can remove the ZUUL_REFNAME references from the script 22:26:18 my main concern is when we move forward with https://review.openstack.org/513506/ if that jobs was still using zuul-cloner too 22:26:26 but, looks to be okay also 22:26:48 jeblair: agreed 22:26:50 pabelanger: we should probably only merge that change after we remove zuul-cloner from images 22:27:06 fungi: and ya current issue looks like update-constraints has moved on from newton and grown some new flags that we should just avoid using against stable branches 22:27:28 pabelanger: otherwise, i think it will have the opposite effect (zuul-cloner will work "better" because it doesn't "need required-projects" any more) :) 22:28:09 sure, I'm trying to avoid having jobs move to parent: base while using zuul-cloner 22:28:39 pabelanger: i understand. that change won't stop that until we remove zuul-cloner from images. 22:28:45 when do you think we'd remove zuul-cloner 22:28:59 (from images) 22:29:07 pabelanger: as soon as someone writes that change, i would think. now that we've decided not to roll back. 22:29:12 The shim? 22:29:16 jeblair: ack 22:29:29 dmsimard: the actual installation that the shim then goes in over the top of 22:29:36 Ah, okay. 22:30:03 so if you're a legacy job, you still get the shim. if you're not, there is no zuul-cloner. 22:30:09 only zuul. 22:30:21 :) 22:30:21 ++ 22:30:39 ;) 22:30:45 okay, anything else from the etherpad jump out at anyone? 22:31:23 #topic nodepool jobs 22:31:24 do we still need to restart executors for latest ssl fix? 22:31:30 #undo 22:31:30 Removing item from minutes: #topic nodepool jobs 22:31:31 On my end, mostly the ara wsgi thing. I think that would easily allow us to have 1) ara reports all the time 2) at a much lower cost 22:31:46 clarkb: yes, i can probably knock out some new executor restarts this evening if that has merged now 22:31:49 clarkb: i think we do, yes. 22:31:57 ok let me know if I can help with those 22:32:12 you can most certainly help. i'm merely volunteering ;) 22:32:22 we can split them up or whatever 22:32:26 I'e ended up down the bwrap rabbit hole today due to that too fwiw :P 22:32:28 dmsimard: do we need any new ara releases for that? 22:32:54 jeblair: nope. Just a wsgi script which, pending a new release, is in-tree in puppet-openstackci 22:32:55 i'm still helping hammer on the release jobs (latest fun is tag-releases can't auth to lp when it wants to add bug comments) 22:33:37 The wsgi script will be bundled in ara on the next release so we can just stop carrying it. 22:34:04 Oh, on that note, Ansible 2.4.1 should be out this Wednesday. 22:34:19 Along with the new release of ara to support 2.4.1 properly. 22:34:37 dmsimard: cool. i'm somewhat inclined to defer that until after we manage to release openstack. we have delayed the release, and it may be best to avoid risking further disruption for something not immediately on fire right now. 22:34:46 dmsimard: hopefully that's not long though. :) 22:34:59 Sure 22:35:28 reviews in the meantime are appreciated, even if we don't deploy it yet 22:35:32 (i totally want ara back though, i have missed it. :) 22:35:40 ++ re getting release working first 22:36:41 #topic nodepool jobs 22:37:03 #link please review https://review.openstack.org/#/q/topic:nodepool-migration+status:open 22:37:20 Shrews: what issues did you run into when doing that? 22:38:21 jeblair: first and foremost was understanding the PTI policy and the reasoning behind it. but i understand that now 22:39:21 cool. i think there are maybe some grey areas there still... i was thinking of bringing it up at tomorrow's infra meeting. 22:39:46 jeblair: the other was understanding the job variants and disabling a template job for certain branches. 22:40:14 jeblair: yeah, i've seen others have the same confusion in #-infra 22:40:58 yeah, i think that's squarely in the grey area. basically: where should a project disable or alter a PTI job. 22:43:01 thanks to pabelanger, all changes to project-config to remove the problematic templates have merged 22:43:10 https://review.openstack.org/512637 implements the py27 jobs directly 22:43:33 https://review.openstack.org/513766 is for the feature/zuulv3 branch (and py35 jobs) 22:44:22 we should get those merged ASAP to have py27 and py35 jobs running 22:44:50 we should probably dust off the zuul-nodepool integration job when we have a few mins too. 22:45:13 jeblair: yeah. i want to move to the non-legacy devstack job too. but first things first 22:45:16 are no jobs running now because we are in an in between state? 22:45:29 Shrews: ++ 22:45:30 clarkb: dsvm jobs are, but yeah 22:45:48 * clarkb makes note to review those after the meeting 22:45:58 clarkb: many thanks! 22:46:43 #topic stable branch issues 22:47:16 in tracking down some of the issues on the etherpad, i realized there are some issues with zuul's configuration system and stable branches 22:47:33 at the end of last week, i wasn't even in a position to articulate what the issues were 22:48:00 but i took a moleskine with me on a hike this weekend to try to work through it 22:48:37 * dmsimard googles moleskine 22:49:08 and i think i've got a handle on them. i think there are 5 related problems, and they all pretty much need to be solved simultaneously. 22:49:13 dead-tree notebook 22:49:39 i think i also just about have solutions to them as well 22:49:45 dmsimard: jeblair is old school. you should ask to see his camera sometime :) 22:49:53 or at least, a first pass at a solution 22:50:20 Shrews: it's alright, no problem with that. English is not my first language so sometimes there's one of those new words... :) 22:50:30 * Shrews hopes jeblair got some good photos on the hike 22:50:54 my hope is to finish working through this, and then describe the problems along with some proposed solutions 22:51:43 You're keeping us in suspense 22:51:45 i've dug into this now, because i think as soon as people really try to use some of the branch stuff in earnest, we're going to hit problems that don't currently have good solutions. 22:52:06 Telling us there's a problem, that you have a solution but no details :( 22:52:23 i don't have a solution. i think i'm close to having a solution. 22:52:38 i could describe everything now, but i feel like i'd be wasting people's time. 22:53:06 Ok, so, anything to look out for ? 22:53:29 basically, i think i should at least be able to describe the problems fully before i waste anyone else's time. 22:54:17 but i wanted to mention it so that folks know there's some undesirable behavior around stable branches 22:54:31 and if you see any issues related to that, send them to me 22:54:43 Ack. 22:54:47 i was not aware of issues in that area, but will keep an eye out now. thx for the info 22:55:00 i'm hoping i'll be able to write up my findings by tomorrow. 22:55:22 in like, legible electronic form. not what's in the moleskine. 22:55:47 the margin of your moleskine is too narrow to contain the solution? 22:55:48 would be interesting to see them however :) 22:55:53 you could send copies via snail mail 22:56:45 addressed to clark boylan, c/o convention center, sydney, nsw, australia 22:57:01 better send it air mail 22:57:11 #topic even more open discussion 22:57:20 anything else? 22:57:44 I'm gonna pop out for a bit after meeting then will be back to review nodepool job changes and help with executor restarts 22:57:45 was going to ask about powering down zuulv2 servers, but maybe that is for tomorrow infra meeting 22:58:08 pabelanger: yeah, let's check in then 22:58:08 i missed an opportunity to exercise my latin: hanc marginis exiguitas non caperet 22:58:23 ++ 22:58:29 oh, open floor 22:59:02 pabelanger and I this morning hosted an informal Q&A/ask me anything on TripleO and Zuul v3 with about two dozen developers involved in TripleO and it's CI 22:59:25 dmsimard: nice! you seem to have survived? 22:59:37 happy to report that we got a good turn out and I think we managed to fend out some worries/frustrations with v3 and told them it was awesome 22:59:56 it was recorded (red hat bluejeans) and shared with the folks that couldn't attend 23:00:00 would it be helpful to properly bubble that feedback upstream? 23:00:27 yeah, even if it's stuff we know, would still be good to know what the initial roadblocks are for folks. 23:00:27 there's an unorganized etherpad https://etherpad.openstack.org/p/migrating-tripleo-zuulv3 23:00:32 (storyboard stories or whatever) 23:00:50 It was more about education than missing features or blockers, most of them had not even been interested in v3 at all 23:01:19 which makes sense if they were also not especially interested in v2 23:01:26 general interest in zuulv3 is a recent phenomenon. :) 23:01:29 The topic of artifacts did come up and I very briefly discussed that with jeblair 23:02:07 I think the moment we were able to convince them that v3 is able to make their lives better and easier, you could see the sparkles in their eyes 23:02:15 "artifacts" as in being able to pass build artifacts from one job to another? 23:02:20 fungi: yes 23:02:24 Yah, could be turned into a FAQ for sure 23:02:25 cool 23:02:28 dmsimard: nice! thanks. i'll give the etherpad a once over. 23:02:33 i think we're a bit over time... 23:02:36 so let's 23:02:38 #endmeeting