#openstack-meeting-alt log

22:06:10 <jeblair> #startmeeting zuul
22:06:10 <dmsimard> ok /me scratches it off
22:06:10 <openstack> Meeting started Mon Nov 27 22:06:10 2017 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:06:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:06:13 <openstack> The meeting name has been set to 'zuul'
22:06:56 <jeblair> dmsimard: it's also a grey area :)  not completely out of scope, but perhaps not intrinsic to zuul itself
22:07:10 <jeblair> dmsimard: maybe let's stick it at the end
22:07:14 <jeblair> which it already is
22:07:24 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul
22:07:33 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-11-20-22.07.html
22:07:35 <dmsimard> jeblair: the meeting participants are largely the same, let's talk about it tomorrow :p
22:07:48 <jeblair> #topic Roadmap
22:08:07 <jeblair> based on a discussion right before summit, i've moved our roadmap into storyboard
22:08:16 <jeblair> #link roadmap in storyboard https://storyboard.openstack.org/#!/board/53
22:08:26 <jeblair> i *just* finished typing everything in
22:08:35 <fungi> still has that new board smell
22:08:40 <jeblair> that's one story for each line in the roadmap email i sent
22:09:08 <jeblair> in most cases, i just put in a bare-bones description of what the item is
22:09:23 <pabelanger> great
22:09:24 <Shrews> that must have been fun data entry
22:09:26 <jeblair> if you have a particular interest in an item, or know what it means, please do update the descriptions with more explanation
22:09:27 <fungi> perhaps a meta-item, but at what point do we merge back to master?
22:09:51 <jeblair> fungi: good question, let's talk about that next
22:09:57 <fungi> k
22:10:15 <fungi> wondered if it was missing from the roadmap, or at a different level than the roadmap
22:10:17 <jeblair> i *almost* finsished assigning folks to the tasks i know about
22:10:43 <jeblair> please assign yourself if you're working on something with no assignee
22:10:58 <jeblair> that's a fully automatic board
22:11:37 <jeblair> the entries are all stories, and the rows are generated by queries based on tags
22:11:47 <jeblair> so a story tagged with zuulv3.0 will appear in the first column
22:11:53 <jeblair> zuul v3.1 in the second
22:11:58 <jeblair> er "zuulv3.1"
22:12:03 <jeblair> and "zuulv3.x" in the third
22:12:35 <jeblair> i had one or two more categories in the email, i'm not sure if it's useful to break it down further or not...
22:12:52 <clarkb> o/
22:12:59 <dmsimard> fwiw I made progress on the json/finger:// issue last week and I have a huge draft email I was going to send out but I figured I'd sleep on it and take another look this week, I'll add a comment on the story.
22:13:39 <jeblair> i'd love to have "3.0 in progress" and "3.0 todo", but that would require the kind of script we were using for the initial zuulv3 board, since storyboard can't do that query natively.
22:13:54 <jeblair> if folks think that's really important, i can set that up.  it's not too hard.
22:14:24 <jeblair> dmsimard: what's the json/finger:// issue?
22:14:46 <dmsimard> The zuul_json truncated problem that leads up to a finger:// URL
22:14:53 <jeblair> dmsimard: ah cool
22:15:31 <dmsimard> I got rabbit-holed into trying to understand how we could avoid finger:// URLs altogether but it's a bit complicated :)
22:16:06 <jeblair> dmsimard: you're talking about the issue where zuul_json is broken, yeah?
22:16:15 <jeblair> dmsimard: https://storyboard.openstack.org/#!/story/2001329 ?
22:16:21 <dmsimard> yeah, due to a callback bug with some tasks
22:16:29 <dmsimard> yes
22:16:36 <dmsimard> I'm writing a comment there to describe the issue a bit right now.
22:17:13 <jeblair> okay.  my preference would be to fix that bug and not focus on the way it manifests as an error.
22:17:29 * dmsimard nods
22:18:12 <jeblair> we should probably come up with a process for adding new things
22:18:30 <dmsimard> I have access to a zuul v3 instance now (thanks tristanC and team) it's easy to reproduce so it's just a matter of having time to spend on it
22:19:37 <jeblair> i don't have a suggestion for a process for adding new items to the roadmap at the moment, so for now, maybe let's just discuss things in #zuul
22:19:44 <jeblair> we'll come up with something later
22:20:08 <jeblair> i see the 'release 3.0' lane as a burn-down list for us to do the 3.0 release
22:20:37 <jeblair> so if there's something there you can help with, please do -- either by assigning yourself and hacking on it, or reviewing changes related to it
22:21:03 <jeblair> anything else about the roadmap?
22:21:59 <jeblair> please let me know if it's useful, or not useful.
22:22:06 <jeblair> #topic merging into master
22:22:39 <jeblair> i think last week the topic of merging into master came up, and there was some support
22:22:47 <fungi> asap? immediately prior to tagging 3.0.0? somewhere in between?
22:23:37 <clarkb> I think it would be good to get done as soon as resonably possible. THe bulk of the 3.0 work is done and the master branches have largely been dormant. This will make it clear release is imminent and that future dev focus is on the newer stuff
22:23:48 <jeblair> i'm inclined to merge asap -- at least in my mind, the criteria for merging into master was when infra was running v3, and shortly before the release.
22:23:58 <clarkb> it will force us to sort out some of the deployment tooling transition stuff too which is good
22:24:01 <fungi> and do we preemptively branch a stable/2.x from current master, or just tag it 2.6.1 and then branch when we find we need to append there?
22:24:01 <dmsimard> Is a merge even possible ? Or is that a kind of git push --force kind of thing ?
22:24:12 <clarkb> fungi: I think we just tag and if we need to branch later
22:24:19 <jeblair> dmsimard: 'git merge --strategy ours' or however you spell that
22:24:31 <jeblair> clarkb: ++
22:24:31 <pabelanger> for nodepool, merging back to master will make things a little easier upgrading our nodepool-builders. So +1 for that, but we can use feature/zuulv3 and do the dance with patches
22:24:37 <fungi> dmsimard: a "merge" is always possible, if we choose the right strategy ("ours" will basically be an overwrite)
22:25:05 <jeblair> pabelanger: yeah, i think this conversation applies to both repos
22:25:05 <dmsimard> TIL
22:25:27 <fungi> right, i assumed we were talking about merging in zuul and nodepool repos more or less synchronously
22:25:29 <dmsimard> will come in handy because I'll need to merge back feature/1.0 into master for ARA as well, thanks for that :D
22:25:38 <SpamapS> +1 for merge to master soon. Do we have any I'm progress fixes for 2.x?
22:25:46 <SpamapS> In
22:26:01 <jeblair> SpamapS: not that i'm aware of -- all the urgent gerrit upgrade stuff has merged into master and made it into a release
22:26:11 <clarkb> and the jenkins + nodepool stuff merged iirc
22:26:29 <fungi> #link https://review.openstack.org/#/q/project:openstack-infra/zuul+is:open
22:26:36 <fungi> #undo
22:26:44 <fungi> #link https://review.openstack.org/#/q/project:openstack-infra/zuul+is:open+branch:master
22:27:10 <jeblair> probably the next step is to start an infra mailing list thread about it, to make sure we coordinate with folks deploying from master
22:27:15 <jeblair> i'm happy to start that
22:27:25 <fungi> we have _lots_ of open zuul repo changes for master, looks like
22:27:54 <SpamapS> jeblair: indeed. A long runway for them to speak up seems like a good idea.
22:27:57 <jeblair> i think after that we need to sort out the puppet-openstackci deployment bits.  i'd love it if someone else volunteered to drive that.
22:28:34 <jeblair> i don't think it's that hard (we discussed driving everything from a zuulv3 switch argument), but some care will need to be taken.
22:28:59 <clarkb> and probably want to have at least one third party ci group/individual following along for early feedback
22:29:15 <jeblair> i'll ask for a volunteer for that on the mailing list
22:29:16 <SpamapS> For the open changes.. perhaps a script that automatically WIP's them all with a friendly note? I mean, they'll likely merge conflict anyway, so might be the nice thing to do to let authors know why their change is stagnating and unlikely to move forward.
22:30:02 <fungi> i can take care of mass-wip'ing, that's trivial via gertty (no need for a script)
22:30:22 <fungi> just process-mark all changes matching the above query and then review them all in one shot
22:30:48 <fungi> s/i/i or anyone/
22:30:51 <SpamapS> \o/ gertty
22:31:14 <clarkb> ttx forwarded someone to us that was interested in getting involved in the more sysadminy bits during openstack day france. Possible they would be interested in working through some of the puppet for this
22:31:23 <fungi> well, any zuul-core reviewer can at any rate
22:31:30 <clarkb> Thats kind of jumping into the deep end but probably in a good way :)
22:31:50 <jeblair> #action jeblair start email thread about merging into master
22:32:15 <clarkb> I'll try to get them on the mailing list and lurking in irc and see what they are interested in poking at
22:32:31 <jeblair> fungi: yeah, it may be worth taking a pass through and triaging some of those into "handled by zuulv3" "irrevenant in zuulv3" "please re-propose to zuulv3"
22:32:34 <fungi> while i think it's great if someone new wants to handle the puppet module updates, that may be an extraordinary amount of pressure since it's a task which is blocking some time-sensitive roadmap actions
22:33:04 <pabelanger> I can likey start work on puppet-openstackci, if somebody else doesn't do so before me
22:33:10 <clarkb> fungi: ya thats true, but I also think it will likely largely be mechanical puppet api updates which if familiar with puppet should be straightforward
22:33:31 <clarkb> (if not familiar with puppet then likely not a good first task)
22:33:43 <fungi> if it turns out to be a fairly trivial amount of work there, then i agree that would be a great way to get to know the rest of the team ;)
22:34:45 <jeblair> since we're already in the openstack / zuul grey area...
22:34:51 <jeblair> #topic Update ARA version on executors
22:35:34 <clarkb> is ara something we should just ensure latest on the executors?
22:35:53 <jeblair> #link https://review.openstack.org/516740
22:35:58 <jeblair> apparently this is tricky in puppet
22:36:24 <dmsimard> I meant to ask the TC if ara was something we could put in upper constraints, however it wouldn't really make sense in the context of zuul
22:36:29 <clarkb> the trick is to just call pip directly
22:36:42 <clarkb> with an onlyif clause that checks the version of ara against what is available
22:37:29 <clarkb> I can help with that if necessary
22:37:31 <dmsimard> I have to step away momentarily :( but I wanted to unblock that review somehow because there are some fixes in there we want (such as the firefox permalink issue)
22:37:35 <dmsimard> brb
22:37:36 <jeblair> it's also possible to solve this by declaring ara an extra dependency of zuul, but i'd prefer to avoid that and actually generalize the callback mechanism anyway.
22:37:55 <jeblair> so if there's a way to fix 516740, that's what i'd prefer
22:38:10 <jeblair> clarkb: can you take that over then?
22:38:10 <dmsimard> jeblair: I've already cut down the amount of deps in 1.0 (at least 3 dependencies gone) and started splitting the components
22:38:10 <clarkb> but basically exec resource that calls pip with all the options necessary, then onlyif paratmer (This is likely the trickiest bit to get that check right)
22:38:12 <dmsimard> so it's WIP
22:38:15 <dmsimard> brb
22:38:25 <clarkb> jeblair: ya I can work on a new patchset to do the thing I describe
22:38:31 <jeblair> clarkb: thanks!
22:38:43 <jeblair> #action clarkb fix https://review.openstack.org/516740 to call pip directly
22:39:08 <jeblair> #topic Open Discussion
22:39:21 <jeblair> anyone have anything else?
22:39:28 <clarkb> is there any outstanding cleanup work?
22:39:38 <clarkb> deleting servers, removing dead puppet code, etc?
22:39:54 <fungi> i almost have an etherpad with a list of the v2 (master) changes proposed for zuul and nodepool
22:40:19 <fungi> #link https://etherpad.openstack.org/p/zuulv2-outstanding-change-triage
22:40:25 <fungi> that's where it'll appear in a few minutes
22:40:35 <jeblair> clarkb: i think there are still some disabled tests in zuul
22:41:00 <jeblair> clarkb: i can't think of examples of the 2 things you cited
22:41:10 <pabelanger> I'd like to get a few eyes on https://review.openstack.org/521324/ Add support for shared ansible_host in inventory now that dmsimard has added a few notes
22:41:25 <pabelanger> clarkb: we likey can delete the release slaves, I can look at that tomorrow
22:41:53 <Shrews> just an FYI for folks, i'm fairly certain that the cause of the finger daemon dying on the executor VMs is b/c of OOM issues. zuul-executor is chosen for killing, but it's actually the child of that process (the finger daemon) being killed
22:42:09 <Shrews> this was discussed in #zuul, but mentioning here for wider exposure
22:42:14 <pabelanger> We also restarted zuulv3 late last week, to address memory issues. It was just about to hit 15GB of RAM
22:42:35 <jeblair> Shrews: now that i think about that more -- i wonder why zuul is using so much memory... i mean, ansible is doing all the work
22:43:08 <jeblair> Shrews: you're fairly certain it's the main process that's using lots of ram?
22:43:50 <jeblair> Shrews: oh, i wonder if there's some ansible output buffering the executor is doing that's using lots of ram
22:44:30 <jeblair> pabelanger: how long had it run, do you know?
22:44:32 <Shrews> jeblair: yes. kern.log outputs the mem usage of things before killing
22:45:11 <jeblair> Shrews: know how much it was using in absolute terms?
22:45:18 <fungi> okay, that etherpad linked earlier is now complete: 133 open zuul changes and 65 open nodepool changes on master at the moment
22:45:31 <pabelanger> jeblair: about a week I think, that was the last time I had restarted it
22:45:44 <fungi> also worth noting, we still have a couple open changes on nodepool's feature/gearman-zk-shim branch. presumably those can go away?
22:45:53 <jeblair> pabelanger: hrm.  we should be able to do better than that.
22:45:54 <pabelanger> jeblair: but, I can get a more exact number
22:46:10 <Shrews> jeblair: http://paste.openstack.org/raw/627522/
22:46:15 <Shrews> that was from ze4
22:46:19 <jeblair> fungi: yeah, i think so.  we can probably abandon them and delete the branch.  let's double check with mordred.
22:46:25 <pabelanger> jeblair: yah, memory does seem to spike when new patchsets with .zuul.yaml are added
22:46:37 <jeblair> pabelanger: did we have a bunch all at once?
22:46:43 <jeblair> maybe we should graph that as a metric
22:46:43 <pabelanger> this time, it was chef repos that were coming online
22:46:49 <pabelanger> jeblair: yah, about 6 patches
22:46:50 <fungi> jeblair: pabelanger: was the zuul scheduler memory consumption maybe due to someone posting a new flurry of dynamic reconfiguration triggering changes?
22:47:01 <fungi> er, i guess you already went there
22:47:14 <jeblair> Shrews: know the units for that by any chance?
22:47:24 <mordred> jeblair, fungi: we can definitely delete anyhting on feature/gearman-zk-shim
22:47:29 <mordred> fungi: I can abandon those right now
22:47:43 <jeblair> mordred: want to go ahead and delete the branch?
22:47:45 <fungi> thanks mordred!
22:48:01 <jeblair> anyone object to deleting the feature/gearman-zk-shim branch?
22:48:08 <Shrews> jeblair: i do not
22:48:14 <pabelanger> ++
22:48:18 <fungi> far from it, i wholeheartedly endorse deletion of that branch now
22:48:32 <mordred> jeblair: yah - can do
22:48:44 <fungi> having more than one feature branch open for a repo seems like a recipe for problems, to me
22:48:56 <jeblair> Shrews: i think it may be 4k pages
22:49:12 <mordred> fungi, jeblair
22:49:15 <mordred> fungi, jeblair: done
22:49:41 <fungi> now we just have some ~200 pre-v3 changes to zuul and nodepool to figure out what to do with ;)
22:50:05 <jeblair> Shrews: so that's 4.6 GB of ram.  it seems like a lot.
22:50:10 <clarkb> fungi: we probably want to sort them by features vs bugfixes
22:50:30 <clarkb> fungi: bugfixes consider merging based on severity, features point to v3 things?
22:50:35 <fungi> agreed, some means of grouping/tagging would make sense next
22:50:37 <jeblair> Shrews: may be a leak in the executor?
22:50:52 <Shrews> jeblair: "Nov 23 00:26:22 ze04 kernel: [8494719.679223] git invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0"
22:51:00 <Shrews> jeblair: does that mean a git operation is triggering it?
22:51:44 <fungi> i think it means git asked for some memory, the kernel realized it had none, and then excitement ensued
22:52:02 <fungi> or hilarity, depending on your mood
22:52:12 <jeblair> yeah, so not really git's fault.  just in the wrong place at the wrong time
22:52:53 <Shrews> just wondering it the git operations are using more memory than we expect. at any rate, probably a #zuul discussion
22:53:34 <fungi> probability is the number of git operations mean that an allocation request from a git process is statistically likely to make the next request
22:53:55 <fungi> er, be the next request, whatever
22:54:42 <fungi> your highest-churn processes are the ones most likely to trigger an oom action, but the process using the most memory could easily be something else
22:54:55 <Shrews> *nod*
22:55:27 <fungi> the sacrificed children mentioned in the log are the most likely culprits
22:56:22 <fungi> [Thu Nov 23 00:26:21 2017] Killed process 20080 (zuul-executor) total-vm:850552kB, anon-rss:8900kB, file-rss:6048kB
22:56:29 <Shrews> fungi: in this case, the child was using much less memory than the parent
22:56:56 <Shrews> but the child was chosen, thus no more finger process
22:57:24 * fungi avoids making a lotr reference involving us being down to only 9 fingers
22:58:54 <jeblair> we're about at time
22:58:58 <jeblair> so thanks everyone!
22:59:04 <jeblair> #endmeeting