19:03:56 #startmeeting infra 19:03:57 Meeting started Tue Nov 29 19:03:56 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:58 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:04:00 The meeting name has been set to 'infra' 19:04:04 o/ 19:04:04 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:04:05 o/ 19:04:15 #topic Announcements 19:04:22 o/ 19:04:35 i don't have any important announcements this week 19:04:43 as always, feel free to hit me up with announcements you want included in future meetings 19:04:51 #topic Actions from last meeting 19:05:22 #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-11-22-19.02.log.html 19:05:32 #undo 19:05:33 Removing item from minutes: 19:05:41 #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-11-22-19.02.html 19:05:57 #action fungi send summit session summary to infra ml 19:06:00 i swear it's coming 19:06:16 hahaha 19:06:25 i have more than half of them written up 19:06:28 just need to finish 19:06:31 the wait makes it more exciting! 19:06:33 #topic Specs approval 19:06:50 #info APPROVED "Gerrit 2.13 Upgrade" spec 19:07:10 #link http://specs.openstack.org/openstack-infra/infra-specs/specs/gerrit-2.13.html Gerrit 2.13 Upgrade 19:07:14 p/ 19:07:20 o/ 19:07:42 that also got added as a priority effort, since there were no objections 19:08:05 though the post job hasn't finished running for that one yet 19:08:18 should show up in the priority list shortly 19:08:21 \o/ 19:08:39 i haven't approved "Zuul v3: Add section on secrets" yet since i'm unclear on whether clarkb wanted that updated 19:08:52 but we have time to work it out in this meeting 19:09:25 #link https://review.openstack.org/386281 "Zuul v3: Add section on secrets" 19:09:43 i'm sorta stuck on doing gerrit upgrade tasks atm, can discuss later in meeting though. 19:09:57 zaro: cool, i'll add that in the priority efforts discussion shortly 19:10:03 ohai sorry got distracted but am here now 19:10:15 fungi: I responded to your comment in -infra 19:10:19 asid I was fine with a followup 19:10:47 jeblair: follow-up to implement clarkb's requests is cool then? i'll go ahead and approve now 19:10:57 \o/ 19:11:06 thanks clarkb! 19:11:50 #info APPROVED "Zuul v3: Add section on secrets" change 19:12:29 #topic Priority Efforts: Gerrit 2.13 Upgrade (zaro) 19:12:57 fungi: yes, i will implement at least one of those suggestions; i will give the one which is slightly redundant a shot but no promises. :) 19:13:42 i'm sorta stuck on this change, #link https://review.openstack.org/#/c/38674 19:13:51 that number doesn't look long enough 19:14:19 right, it's #link https://review.openstack.org/#/c/386745 19:14:46 #link https://review.openstack.org/386745 Allow zuul launcher and scheduler on the same host 19:15:13 so i'm trying to setup zuul-dev so that we can test new gerrit with a zuul CI In place. 19:16:04 had a discussing with nibalizer yesterday about changes to puppet-openstackci to make that work which may cause problems for downstream. 19:16:19 would like to get other people's opinions about this 19:16:27 maybe it is just easier to stand up separate dev servers? 19:16:51 moving our configs to a single host, is a lot of code churn 19:16:55 o/ 19:17:07 pabelanger: zaro but openstackci is already supposed to support all in one aiui 19:17:16 y 19:17:20 so its a bug that should be fixed regardless 19:17:24 but we purposely didn't implement zuul-launcher support in it, i thought 19:17:28 clarkb: there is some issues with zuulv25 things 19:17:29 clarkb: fix in openstackci? 19:17:40 yah the issue is the decision not to support z2.5 imho 19:17:43 fungi: is that the only piece that isn't working? 19:17:48 puts openstackci in a weird spot 19:17:54 which, if we land in openstackci, will break system-config 19:18:05 pabelanger: not sure I follow 19:18:06 we can fix it in openstackci, but will be a lot of chrun 19:18:06 because previously we said that we'd keep openstackci running what we run 19:18:21 yah. I agree that is awkward 19:18:31 we can move it into openstackci. but i still don't think we should recommend anyone use it. 19:18:48 https://review.openstack.org/#/c/311140/ 19:19:01 was a WIP for zuul_launcher in openstackci 19:19:08 aiui, the challenge comes from directly instantiating zuul::launcher in openstack_project but instantiating other puppet classes for zuul through the openstackci module 19:19:12 but was abandoned 19:20:09 yah 19:21:06 so if it will help, let's move it into openstackci. 19:21:06 if openstackci can hold launcher stuff, it gets easier to write it I think 19:21:12 i do think it will help jeblair 19:21:50 maybe we leave the default to be jenkins, then we add some flags that turn on a launcher 19:22:06 yep. with a comment saying don't do that. :) 19:22:07 and infra uses that on the zuul launchers, and in zaros allinone test node 19:22:23 do not press big red button 19:22:29 no one ever does that :P 19:23:20 I'll be around to help with the move 19:23:26 i'm going to trust that it will help 19:23:50 how would that look on system-config side? just call into openstackci pieces to setup zuul? 19:24:03 for the dev node or for the launchers? 19:24:05 one in prod mode and one in test mode? 19:24:47 I mean how does that seperation work with zuul_dev.pp and zuul_prod.pp? 19:24:50 zaro: we should be able to follow same steps for zuul_merger, but using zuul_launcher 19:25:27 then add openstackci::zuul_launcher into zuul_dev.pp 19:25:57 yah 19:25:59 i guess some of the confusion is also around how to serialize the needed changes between the different puppet modules 19:26:09 if we leave zuul_prod.pp alone for now, that will be less disruption 19:26:10 or has that been addressed now? 19:26:26 and we write openstackci:::zuul_launcher so that it can co-exist with openstackci::zuul_scheduler 19:26:39 yes 19:26:49 fungi: I don't think its an issue, if you don't explicitly use openstackci::zuul_lajuncher its a noop 19:27:00 good point 19:27:04 fungi: which will let our existing code behappy, whcih we can then switch over to use openstackci::zuul_launcher 19:27:09 yah i dont expect us having to change zuul_prod.pp at all 19:27:10 i guess that directly addresses it anyway 19:27:21 yep 19:28:43 oh now i see what clark is saying 19:28:54 there should be a change to the launchers puppet, yes, but it shouldn't change the daemon 19:29:50 anything else need covering on the gerrit upgrade prep at this point? 19:30:04 zaro: are you all squared away on what you need for the puppet end of this? 19:30:13 will pabelanger work on the needed zuul_launcher change? 19:30:28 his wip change 19:30:37 or abandoned i guess 19:31:02 I can help get it landed 19:31:11 don't mind doing it 19:31:32 cool, thanks! 19:32:40 #action pabelanger add openstackci::zuul_launcher puppet class 19:32:45 that cover it? 19:32:56 WFM 19:32:56 i think so. 19:33:03 awesome 19:33:17 thanks zaro, pabelanger, nibalizer, clarkb et al! 19:33:37 #topic Open discussion 19:33:50 anybody have anything off-agenda to bring up this week? 19:34:01 we had an outage of ask.o.o over the weekend 19:34:07 ahh, yep. so we did 19:34:13 HDD space issue 19:34:29 between apache and jetty, about 30GB of logs 19:34:35 there's a bigger outstanding issue with ask.o.o... there's an outstanding upgrade to newer askbot 19:34:35 I recommend we stop running out of disk 19:34:48 if anybody knows jetty, would love some help how to reduce the log levels for it 19:34:58 i approved it, didn't work i think because there's some issue with the deployment automation for it, so had to roll it back 19:35:20 fungi: for that issue at least we should probably just use vcsrepo 19:35:22 i'm not entirely unconvinced the uptick in jetty log volume could be related 19:35:30 its what we use everywhere else because ~vcsrepo has never worked for us 19:35:36 er !vcsrepo 19:36:18 yeah, i think the commit history in the puppet-askbot repo indicates that the switch from vcsrepo to git provider was an attempt to solve an issue/shortcoming with vcsrepo 19:36:53 #link https://review.openstack.org/235389 Refactor vcsrepo to git resource to resolve refresh trigger issues 19:37:07 vcsrepo was retriggering deployment on every run or something 19:37:21 ah 19:37:48 trying to ensure a specific git ref caused it to notify each time the resource was evaluated, if memory served 19:37:57 mrmartin may remember more detail 19:38:22 vcsrepo? 19:38:27 right 19:38:48 yeah, we had some issues with vcsrepo 19:38:53 discussing the decision from a year ago to switch from vcsrepo to git resource for puppet-askbot 19:39:25 which in turn seems to have the opposite problem of not notifying when the ref changes 19:39:31 some features was missing from git resource as I remember 19:39:46 was / were 19:40:49 yeah, it appears that ensuring a specific git ref with vcsrepo notifies other resources every time it's evaluated, and ensuring a specific git ref with the git resource fails to notify other resources when the git ref changes 19:40:55 so vcsrepo triggers things on ever run, current thing doesn't actually update the git repo? 19:41:22 give me a min I can check the task history. 19:41:57 clarkb: it updates the git repo but doesn't notify anything else that the git repo changed 19:42:07 clarkb: so actions that trigger when the git repo changes don't fire 19:42:14 yeah, now i don't remember which of those it ended up being 19:42:33 i'd have to go back to my post-mortem in the irc logs from when i rolled things back 19:42:36 ah ok so I misunderstood when this was debugged before and thought the git repo didn't update 19:43:17 https://review.openstack.org/#/c/235389/3 19:43:23 mrmartin: the reason this is coming up though is that we've (suddenly?) started running out of disk space in the last few weeks, and jetty logs look to be what's mostly filling up the filesystem 19:43:29 clarkb: I could also be massively wrong - it's easy to get backwardsed on this 19:43:41 though i don't think anyone has yet looked into why jetty is logging so much lately 19:43:44 yeah, it looks like nb01 installs nodepool every puppet run 19:43:48 jetty? 19:43:55 we are using jetty for apache solr 19:44:08 so vcsrepo appears to do that when you give it a branch at least 19:44:10 mrmartin: right, i was wanting to rule out the recent failed and rolled-back upgrade as a cause 19:44:25 but the current solr version seems to be deprecated, but upgrade requires a larger architectural change of solr deployment. 19:45:07 vcsrepo had a bug, and that was the reason that we change for this git resource 19:45:10 mrmartin: and if the volume of jetty logging isn't indicative of a deeper problem, we probably just need to look into how to scale back on its retention or configure it to compress its logs when it rotates them 19:45:31 oh wait, i think i may be wrong about that 19:45:40 i see puppet runs where nodepool was not installed 19:45:58 fungi: yeah as a first step, it is a great idea. I don't have access to the ask.o.o site just for the dev one, and it is not reproducing the same issues due to much lower traffic. 19:46:26 mordred, clarkb, fungi: so it looks like vcsrepo, given a branch, does not install on every pulse. 19:46:41 someone had time to check what the heck is jetty logging? 19:46:43 mrmartin: i can get you some jetty logs after the meeting if you have an opportunity to look at them 19:46:51 ok, I can check it 19:47:01 i'm pulling some up now 19:47:12 maybe we just simply need to decrease the log level, and add logrotate 19:47:29 yes 19:47:34 how do you do that? 19:47:38 because we need to puppet it 19:47:43 lots of info lines about things like "org.apache.solr.core.SolrCore - [core-en] webapp=/solr path=/select/ params={fl=*+score&start=390&q=(vnc)&wt=json&fq=django_ct:(askbot.thread)&rows=10} hits=489 status=0 QTime=0" 19:47:48 looks like maybe performance stats? 19:48:56 i honestly don't know enough about jetty and solr to be able to say for sure, but the fact that they're pretty much all at info loglevel suggests they're benign anyway 19:49:17 so maybe jetty can be configured to compress when it rotates 19:49:32 think it can 19:49:44 yeap, let's try that first, at INFO level it mustn't flood the logs 19:49:56 how quickly it fills up the disk? 19:50:07 wow xml config files are fun 19:51:42 mrmartin: looking for a graph, just a sec 19:52:03 mrmartin: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2547&rra_id=all 19:52:21 #link http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2547&rra_id=all ask.openstack.org - Used Space - / 19:52:30 oh ok, thanks 19:52:38 looks like a few weeks ago it ramped up pretty quickly 19:52:48 ianw hope you're being sarcastic 19:53:03 mrmartin: those sawteeth are from where we deleted some older logs manually each time it filled up 19:53:30 mrmartin: and when / gets full and puppet tries to redeploy askbot for whatever reason, it leaves it in a half-installed state and then the service falls offline apparently 19:53:48 wow 19:54:01 each of those incidents was followed by trying to figure out which libraries to manuallt reinstall with pip to get things working again 19:55:14 one day, when we roll DIBs for control plane server, we should have a different mount for /var/logs :) 19:55:19 so that doesn't happen 19:57:07 yeah 19:57:36 okay, we're coming up on the end of our timeslot. any last-minute items? 19:58:40 translate-dev still nto working 19:58:46 er on xenial 19:58:51 will try to poke at that more later this week 19:59:15 also puppet + systemd is 19:59:19 how to put it nicely 19:59:22 not great 19:59:25 :) 19:59:29 * fungi is unsurprised 19:59:51 okay, that's time. thanks everyone! 19:59:56 #endmeeting