19:03:56 <fungi> #startmeeting infra
19:03:57 <openstack> Meeting started Tue Nov 29 19:03:56 2016 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:04:00 <openstack> The meeting name has been set to 'infra'
19:04:04 <jlvillal> o/
19:04:04 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:04:05 <olaph> o/
19:04:15 <fungi> #topic Announcements
19:04:22 <rcarrillocruz> o/
19:04:35 <fungi> i don't have any important announcements this week
19:04:43 <fungi> as always, feel free to hit me up with announcements you want included in future meetings
19:04:51 <fungi> #topic Actions from last meeting
19:05:22 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-11-22-19.02.log.html
19:05:32 <fungi> #undo
19:05:33 <openstack> Removing item from minutes: <ircmeeting.items.Link object at 0x7fbb93c8f550>
19:05:41 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-11-22-19.02.html
19:05:57 <fungi> #action fungi send summit session summary to infra ml
19:06:00 <fungi> i swear it's coming
19:06:16 <zara_the_lemur__> hahaha
19:06:25 <fungi> i have more than half of them written up
19:06:28 <fungi> just need to finish
19:06:31 <zara_the_lemur__> the wait makes it more exciting!
19:06:33 <fungi> #topic Specs approval
19:06:50 <fungi> #info APPROVED "Gerrit 2.13 Upgrade" spec
19:07:10 <fungi> #link http://specs.openstack.org/openstack-infra/infra-specs/specs/gerrit-2.13.html Gerrit 2.13 Upgrade
19:07:14 <mordred> p/
19:07:20 <zaro> o/
19:07:42 <fungi> that also got added as a priority effort, since there were no objections
19:08:05 <fungi> though the post job hasn't finished running for that one yet
19:08:18 <fungi> should show up in the priority list shortly
19:08:21 <SotK> \o/
19:08:39 <fungi> i haven't approved "Zuul v3: Add section on secrets" yet since i'm unclear on whether clarkb wanted that updated
19:08:52 <fungi> but we have time to work it out in this meeting
19:09:25 <fungi> #link https://review.openstack.org/386281 "Zuul v3: Add section on secrets"
19:09:43 <zaro> i'm sorta stuck on doing gerrit upgrade tasks atm, can discuss later in meeting though.
19:09:57 <fungi> zaro: cool, i'll add that in the priority efforts discussion shortly
19:10:03 <clarkb> ohai sorry got distracted but am here now
19:10:15 <clarkb> fungi: I responded to your comment in -infra
19:10:19 <clarkb> asid I was fine with a followup
19:10:47 <fungi> jeblair: follow-up to implement clarkb's requests is cool then? i'll go ahead and approve now
19:10:57 <rcarrillocruz> \o/
19:11:06 <fungi> thanks clarkb!
19:11:50 <fungi> #info APPROVED "Zuul v3: Add section on secrets" change
19:12:29 <fungi> #topic Priority Efforts: Gerrit 2.13 Upgrade (zaro)
19:12:57 <jeblair> fungi: yes, i will implement at least one of those suggestions; i will give the one which is slightly redundant a shot but no promises.  :)
19:13:42 <zaro> i'm sorta stuck on this change, #link https://review.openstack.org/#/c/38674
19:13:51 <jeblair> that number doesn't look long enough
19:14:19 <zaro> right, it's #link https://review.openstack.org/#/c/386745
19:14:46 <fungi> #link https://review.openstack.org/386745 Allow zuul launcher and scheduler on the same host
19:15:13 <zaro> so i'm trying to setup zuul-dev so that we can test new gerrit with a zuul CI In place.
19:16:04 <zaro> had a discussing with nibalizer yesterday about changes to puppet-openstackci to make that work which may cause problems for downstream.
19:16:19 <zaro> would like to get other people's opinions about this
19:16:27 <pabelanger> maybe it is just easier to stand up separate dev servers?
19:16:51 <pabelanger> moving our configs to a single host, is a lot of code churn
19:16:55 <nibalizer> o/
19:17:07 <clarkb> pabelanger: zaro but openstackci is already supposed to support all in one aiui
19:17:16 <rcarrillocruz> y
19:17:20 <clarkb> so its a bug that should be fixed regardless
19:17:24 <fungi> but we purposely didn't implement zuul-launcher support in it, i thought
19:17:28 <pabelanger> clarkb: there is some issues with zuulv25 things
19:17:29 <zaro> clarkb: fix in openstackci?
19:17:40 <nibalizer> yah the issue is the decision not to support z2.5 imho
19:17:43 <clarkb> fungi: is that the only piece that isn't working?
19:17:48 <nibalizer> puts openstackci in a weird spot
19:17:54 <pabelanger> which, if we land in openstackci, will break system-config
19:18:05 <clarkb> pabelanger: not sure I follow
19:18:06 <pabelanger> we can fix it in openstackci, but will be a lot of chrun
19:18:06 <nibalizer> because previously we said that we'd keep openstackci running what we run
19:18:21 <mordred> yah. I agree that is awkward
19:18:31 <jeblair> we can move it into openstackci.  but i still don't think we should recommend anyone use it.
19:18:48 <pabelanger> https://review.openstack.org/#/c/311140/
19:19:01 <pabelanger> was a WIP for zuul_launcher in openstackci
19:19:08 <fungi> aiui, the challenge comes from directly instantiating zuul::launcher in openstack_project but instantiating other puppet classes for zuul through the openstackci module
19:19:12 <pabelanger> but was abandoned
19:20:09 <nibalizer> yah
19:21:06 <jeblair> so if it will help, let's move it into openstackci.
19:21:06 <nibalizer> if openstackci can hold launcher stuff, it gets easier to write it I think
19:21:12 <nibalizer> i do think it will help jeblair
19:21:50 <nibalizer> maybe we leave the default to be jenkins, then we add some flags that turn on a launcher
19:22:06 <jeblair> yep.  with a comment saying don't do that.  :)
19:22:07 <nibalizer> and infra uses that on the zuul launchers, and in zaros allinone test node
19:22:23 <jeblair> do not press big red button
19:22:29 <nibalizer> no one ever does that :P
19:23:20 <pabelanger> I'll be around to help with the move
19:23:26 <zaro> i'm going to trust that it will help
19:23:50 <zaro> how would that look on system-config side?  just call into openstackci pieces to setup zuul?
19:24:03 <nibalizer> for the dev node or for the launchers?
19:24:05 <zaro> one in prod mode and one in test mode?
19:24:47 <zaro> I mean how does that seperation work with zuul_dev.pp and zuul_prod.pp?
19:24:50 <pabelanger> zaro: we should be able to follow same steps for zuul_merger, but using zuul_launcher
19:25:27 <pabelanger> then add openstackci::zuul_launcher into zuul_dev.pp
19:25:57 <nibalizer> yah
19:25:59 <fungi> i guess some of the confusion is also around how to serialize the needed changes between the different puppet modules
19:26:09 <pabelanger> if we leave zuul_prod.pp alone for now, that will be less disruption
19:26:10 <fungi> or has that been addressed now?
19:26:26 <nibalizer> and we write openstackci:::zuul_launcher so that it can co-exist with openstackci::zuul_scheduler
19:26:39 <pabelanger> yes
19:26:49 <clarkb> fungi: I don't think its an issue, if you don't explicitly use openstackci::zuul_lajuncher its a noop
19:27:00 <fungi> good point
19:27:04 <clarkb> fungi: which will let our existing code behappy, whcih we can then switch over to use openstackci::zuul_launcher
19:27:09 <nibalizer> yah i dont expect us having to change zuul_prod.pp at all
19:27:10 <fungi> i guess that directly addresses it anyway
19:27:21 <fungi> yep
19:28:43 <nibalizer> oh now i see what clark is saying
19:28:54 <nibalizer> there should be a change to the launchers puppet, yes, but it shouldn't change the daemon
19:29:50 <fungi> anything else need covering on the gerrit upgrade prep at this point?
19:30:04 <fungi> zaro: are you all squared away on what you need for the puppet end of this?
19:30:13 <zaro> will pabelanger work on the needed zuul_launcher change?
19:30:28 <zaro> his wip change
19:30:37 <zaro> or abandoned i guess
19:31:02 <pabelanger> I can help get it landed
19:31:11 <pabelanger> don't mind doing it
19:31:32 <zaro> cool, thanks!
19:32:40 <fungi> #action pabelanger add openstackci::zuul_launcher puppet class
19:32:45 <fungi> that cover it?
19:32:56 <pabelanger> WFM
19:32:56 <zaro> i think so.
19:33:03 <fungi> awesome
19:33:17 <fungi> thanks zaro, pabelanger, nibalizer, clarkb et al!
19:33:37 <fungi> #topic Open discussion
19:33:50 <fungi> anybody have anything off-agenda to bring up this week?
19:34:01 <pabelanger> we had an outage of ask.o.o over the weekend
19:34:07 <fungi> ahh, yep. so we did
19:34:13 <pabelanger> HDD space issue
19:34:29 <pabelanger> between apache and jetty, about 30GB of logs
19:34:35 <fungi> there's a bigger outstanding issue with ask.o.o... there's an outstanding upgrade to newer askbot
19:34:35 <mordred> I recommend we stop running out of disk
19:34:48 <pabelanger> if anybody knows jetty, would love some help how to reduce the log levels for it
19:34:58 <fungi> i approved it, didn't work i think because there's some issue with the deployment automation for it, so had to roll it back
19:35:20 <clarkb> fungi: for that issue at least we should probably just use vcsrepo
19:35:22 <fungi> i'm not entirely unconvinced the uptick in jetty log volume could be related
19:35:30 <clarkb> its what we use everywhere else because ~vcsrepo has never worked for us
19:35:36 <clarkb> er !vcsrepo
19:36:18 <fungi> yeah, i think the commit history in the puppet-askbot repo indicates that the switch from vcsrepo to git provider was an attempt to solve an issue/shortcoming with vcsrepo
19:36:53 <fungi> #link https://review.openstack.org/235389 Refactor vcsrepo to git resource to resolve refresh trigger issues
19:37:07 <fungi> vcsrepo was retriggering deployment on every run or something
19:37:21 <clarkb> ah
19:37:48 <fungi> trying to ensure a specific git ref caused it to notify each time the resource was evaluated, if memory served
19:37:57 <fungi> mrmartin may remember more detail
19:38:22 <mrmartin> vcsrepo?
19:38:27 <fungi> right
19:38:48 <mrmartin> yeah, we had some issues with vcsrepo
19:38:53 <fungi> discussing the decision from a year ago to switch from vcsrepo to git resource for puppet-askbot
19:39:25 <fungi> which in turn seems to have the opposite problem of not notifying when the ref changes
19:39:31 <mrmartin> some features was missing from git resource as I remember
19:39:46 <mrmartin> was / were
19:40:49 <fungi> yeah, it appears that ensuring a specific git ref with vcsrepo notifies other resources every time it's evaluated, and ensuring a specific git ref with the git resource fails to notify other resources when the git ref changes
19:40:55 <clarkb> so vcsrepo triggers things on ever run, current thing doesn't actually update the git repo?
19:41:22 <mrmartin> give me a min I can check the task history.
19:41:57 <mordred> clarkb: it updates the git repo but doesn't notify anything else that the git repo changed
19:42:07 <mordred> clarkb: so actions that trigger when the git repo changes don't fire
19:42:14 <fungi> yeah, now i don't remember which of those it ended up being
19:42:33 <fungi> i'd have to go back to my post-mortem in the irc logs from when i rolled things back
19:42:36 <clarkb> ah ok so I misunderstood when this was debugged before and thought the git repo didn't update
19:43:17 <mrmartin> https://review.openstack.org/#/c/235389/3
19:43:23 <fungi> mrmartin: the reason this is coming up though is that we've (suddenly?) started running out of disk space in the last few weeks, and jetty logs look to be what's mostly filling up the filesystem
19:43:29 <mordred> clarkb: I could also be massively wrong - it's easy to get backwardsed on this
19:43:41 <fungi> though i don't think anyone has yet looked into why jetty is logging so much lately
19:43:44 <jeblair> yeah, it looks like nb01 installs nodepool every puppet run
19:43:48 <mrmartin> jetty?
19:43:55 <mrmartin> we are using jetty for apache solr
19:44:08 <jeblair> so vcsrepo appears to do that when you give it a branch at least
19:44:10 <fungi> mrmartin: right, i was wanting to rule out the recent failed and rolled-back upgrade as a cause
19:44:25 <mrmartin> but the current solr version seems to be deprecated, but upgrade requires a larger architectural change of solr deployment.
19:45:07 <mrmartin> vcsrepo had a bug, and that was the reason that we change for this git resource
19:45:10 <fungi> mrmartin: and if the volume of jetty logging isn't indicative of a deeper problem, we probably just need to look into how to scale back on its retention or configure it to compress its logs when it rotates them
19:45:31 <jeblair> oh wait, i think i may be wrong about that
19:45:40 <jeblair> i see puppet runs where nodepool was not installed
19:45:58 <mrmartin> fungi: yeah as a first step, it is a great idea. I don't have access to the ask.o.o site just for the dev one, and it is not reproducing the same issues due to much lower traffic.
19:46:26 <jeblair> mordred, clarkb, fungi: so it looks like vcsrepo, given a branch, does not install on every pulse.
19:46:41 <mrmartin> someone had time to check what the heck is jetty logging?
19:46:43 <fungi> mrmartin: i can get you some jetty logs after the meeting if you have an opportunity to look at them
19:46:51 <mrmartin> ok, I can check it
19:47:01 <fungi> i'm pulling some up now
19:47:12 <mrmartin> maybe we just simply need to decrease the log level, and add logrotate
19:47:29 <pabelanger> yes
19:47:34 <pabelanger> how do you do that?
19:47:38 <pabelanger> because we need to puppet it
19:47:43 <fungi> lots of info lines about things like "org.apache.solr.core.SolrCore  - [core-en] webapp=/solr path=/select/ params={fl=*+score&start=390&q=(vnc)&wt=json&fq=django_ct:(askbot.thread)&rows=10} hits=489 status=0 QTime=0"
19:47:48 <fungi> looks like maybe performance stats?
19:48:56 <fungi> i honestly don't know enough about jetty and solr to be able to say for sure, but the fact that they're pretty much all at info loglevel suggests they're benign anyway
19:49:17 <fungi> so maybe jetty can be configured to compress when it rotates
19:49:32 <pabelanger> think it can
19:49:44 <mrmartin> yeap, let's try that first, at INFO level it mustn't flood the logs
19:49:56 <mrmartin> how quickly it fills up the disk?
19:50:07 <ianw> wow xml config files are fun
19:51:42 <fungi> mrmartin: looking for a graph, just a sec
19:52:03 <fungi> mrmartin: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2547&rra_id=all
19:52:21 <fungi> #link http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2547&rra_id=all ask.openstack.org - Used Space - /
19:52:30 <mrmartin> oh ok, thanks
19:52:38 <fungi> looks like a few weeks ago it ramped up pretty quickly
19:52:48 <bswartz> ianw hope you're being sarcastic
19:53:03 <fungi> mrmartin: those sawteeth are from where we deleted some older logs manually each time it filled up
19:53:30 <fungi> mrmartin: and when / gets full and puppet tries to redeploy askbot for whatever reason, it leaves it in a half-installed state and then the service falls offline apparently
19:53:48 <mrmartin> wow
19:54:01 <fungi> each of those incidents was followed by trying to figure out which libraries to manuallt reinstall with pip to get things working again
19:55:14 <pabelanger> one day, when we roll DIBs for control plane server, we should have a different mount for /var/logs :)
19:55:19 <pabelanger> so that doesn't happen
19:57:07 <fungi> yeah
19:57:36 <fungi> okay, we're coming up on the end of our timeslot. any last-minute items?
19:58:40 <clarkb> translate-dev still nto working
19:58:46 <clarkb> er on xenial
19:58:51 <clarkb> will try to poke at that more later this week
19:59:15 <clarkb> also puppet + systemd is
19:59:19 <clarkb> how to put it nicely
19:59:22 <clarkb> not great
19:59:25 <EmilienM> :)
19:59:29 * fungi is unsurprised
19:59:51 <fungi> okay, that's time. thanks everyone!
19:59:56 <fungi> #endmeeting