16:00:52 #startmeeting Large Deployment Team 16:00:53 Meeting started Thu Apr 16 16:00:52 2015 UTC and is due to finish in 60 minutes. The chair is VW_. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:56 The meeting name has been set to 'large_deployment_team' 16:01:11 howdy, folks 16:01:16 hi 16:01:19 o/ 16:02:14 Alright, let's get started. 16:02:38 #topic Ops Midcycle Meetup debrief 16:02:57 so, I've gone over this - https://etherpad.openstack.org/p/PHL-ops-large-deployments - a few times 16:03:02 VW_: I can find the agenda. Do you have a link? 16:03:11 yep, belmoreira - one sec 16:03:38 https://wiki.openstack.org/wiki/Meetings/LDT 16:03:50 guess I skipped roll call :\ 16:04:29 VW_: thx 16:04:32 yep 16:04:36 o/ 16:04:44 howdy mdorman 16:05:29 so, andyhky is here. I don't think jlk has joined. but that gives us at least one of the moderators at the midcycle 16:05:42 o/ 16:05:51 andyhky: any over arching take aways from the discussion there 16:06:49 The discussion around pruning deleted instances brought this bug up: https://bugs.launchpad.net/nova/+bug/1226049 / https://review.openstack.org/#/c/109201/ 16:06:50 Launchpad bug 1226049 in OpenStack Compute (nova) "instance_system_metadata rows not being deleted" [High,In progress] - Assigned to Alex Xu (xuhj) 16:07:55 did the group commit to any action around it, andyhky 16:09:18 I know we are actually running a script here that purges deleted instance data older than 90 days 16:09:20 We didn't have specific commitments, it was more of a discussion of issues (e.g., pruning, rabbit leaks) and best practices. 16:09:33 kk 16:09:58 I see there was quite a bit of discussion around adding/testing new hosts/nodes as well 16:10:34 speaking of which - does anyone have a solution to deploying to disabled hosts? 16:10:55 I thought that the --availability-zone az:host trick would work - but it doesn't :-/ 16:11:00 yeah 16:11:07 We had a discussion on recommending a disabled by default for a new compute node. 16:11:23 The room seemed to want the default to change to disabled. 16:11:23 klindgren: no 16:11:37 one of my engineers has a patch that will build to a disabled host if you target it specifically, klindgren 16:11:43 because the AZ thing didn't work for us either 16:11:56 let me see if I can find it 16:12:54 andyhky: https://review.openstack.org/#/c/136645/7 is related. It proposes a disable reason by default 16:15:49 any follow up items in general from the midcycle? 16:15:57 or was it all just discussion on these particular topics 16:16:10 I think it's worth coming up with a recommendation on the status of new compute nodes 16:16:22 sounds fair 16:16:41 do we want that ahead of Vancouver 16:16:47 or do we want it on our agenda for there 16:16:53 I'll start a ML thread 16:17:30 #action andyhky Start ML thread on status of a new compute node 16:17:37 cool - thanks, andyhky 16:17:59 I'm still looking for the patch I mentioned above 16:18:08 any other items related to mid-cycle we want to dive into here 16:18:26 I think that's enough from the mid-cycle 16:18:35 cool 16:18:54 #topic Vancouver OPs meetup - LDT session 16:19:12 so, we have like 3 hours I think at our disposal for a working group session(s) 16:19:15 let me verify 16:19:50 ok - not quite - about 2:20 16:20:11 but 3 sessions 16:20:24 1:50 - 2:30 16:20:31 2:40 - 3:20 16:20:41 and 3:30 - 4:10 16:21:28 I really think we need to come out of this one with a new blueprint, comments/reccomendations on exiting blueprints or something like that 16:21:48 that's my opinion, but I think the following comment on the planning etherpad is valid: 16:22:11 "I like this group, too, but what real work would we do here?" 16:22:22 any of you have thoughts? 16:25:12 yes, we need to have "procedure" to raise and share our "concerns" 16:26:09 indeed. It's hard because I know most of us are busy fighting fires and such that comes with running big clouds 16:26:14 because, after the summit how can developers will be notified about our ideas? 16:26:36 I image that only very few will attend the ops sessions 16:26:53 of the devs? 16:26:56 sorry - just clarifying 16:27:33 yes, sorry 16:27:44 so, proposal then: 16:28:33 belmoreira / VW_ - so with this compute node status change, I'm just going to propose a spec and see where it lands 16:29:02 session 1: Pull together all our thoughts on adding new hosts / managing capacity 16:29:25 this would include new specs like andyhky's and comments from the group in the room on any others 16:29:38 andyhky: great 16:29:41 session 2: Find something from the cells v2 folks we can bring to the group and get them feedback on 16:30:07 session 3: general business. review meeting schedule, etc. Find out how we can improve the processes, etc and make our group more influential 16:30:15 If we have a discussion and it results in recommendations, we should have a spec owner and deliver the spec upstream 16:31:12 yeah, I'm thinking before we even leave the room, andyhky 16:31:32 if possible 16:31:36 The spec owner is identified before leaving the room 16:32:26 anyone have any issues with the rough schedule above then 16:32:54 with the requirement that we hold ourselves to turning all recommendations into specs or feedback on existing specs with an owner 16:34:08 sounds good to me 16:34:49 +1 16:35:24 cool - then I'll start working on the etherpad then 16:35:51 #action VW_ start updating etherpad for Vancouver with proposed session schedules 16:36:16 #action VW_ reach out to Cells V2 devs to find out an issue/spec we can working and provide feedback/specs 16:36:38 anything else folks want to discuss for Vancouver? 16:37:06 sorry, a bit distracted here with standup meetings. that all looks good to me for YVR 16:37:20 cool 16:37:45 in that case... 16:37:52 #topic Other Business 16:38:03 The floor is open 16:38:11 anything else anyone wants to discuss? 16:40:29 have any of you looked at this - https://review.openstack.org/#/c/169836 16:40:44 yup 16:41:13 i kinda go back and forth whether this is a good long-term solution, but i think it’s one of those “it’s better than nothing” things 16:41:23 yeah, me too mdorman 16:41:32 johnthetubaguy: is spot on about not assuming the VMS are down 16:41:57 i haven’t read the comments for the last few days, so haven’t seen that yet. 16:42:05 in our case the compute node is virtualized so a separate "node" than the host 16:42:08 good discussion to have, nonetheless 16:42:34 I agree 16:42:51 that's why I thought I'd make more of this group aware :) 16:42:56 yup yup 16:43:34 I need to leave... 16:43:40 but I get a little bi-polar about it too. Probably because we have several automation services that we've built to handle down hosts so we can just have them mark the disabled bit for us 16:43:42 ok belmoreira 16:43:45 thanks for coming 16:43:52 see you in Vancouver 16:43:58 VW: my main worry is that we need a long term plan really, before adding another band-aid 16:44:47 yeah, johnthetubaguy - I'm with you 16:45:46 +1 16:45:47 VW_: totally agreed that its an important thing to sort out 16:46:07 yeah, johnthetubaguy that's why I wanted more of the LDT folks to at least be aware 16:46:17 cool, sounds good 16:46:32 it sounds like we might make one our sessions in YVR focused on adding new hosts/capacity management 16:46:38 maybe we can work this one in too 16:46:42 will be tight 16:47:20 So - we have been running the updates oslo.messaging 16:47:30 with heartbeat stuff under juno code base 16:47:38 rabbit is much much much better 16:47:46 1.9, klindgren 16:47:57 oslo.messaging that is 16:47:57 iirc 1.8.1 16:48:30 hmm - good to know. 16:48:43 yea 1.8.1 they pulled everything into that and cut a tag. 16:49:00 We had some major maintenace the other day that left some compute nodes network down for a long time 16:49:06 they all recovered without issue 16:49:30 no lost/hung rabbit rpc stuff - it all "just worked" within a few minutes of the network comming online 16:49:47 yeah, good to know 16:50:13 we got hit with the issue the other day when a network device failed over and cut the connection between computes and rabit 16:50:16 rabbit even 16:50:20 so I'm all for getting that fix 16:51:03 for those who don't know - here is a related link - https://review.openstack.org/#/c/146047/ 16:51:09 yea - looking forward to the day when I have 99 problems and rabbitmq is not one. 16:51:27 indeed 16:51:34 message queues are hard, evidently 16:51:37 :) 16:52:36 ok, well we have a few minutes left 16:52:47 anything else? 16:53:09 i’m good 16:53:19 thanks for organizing VW_ 16:53:29 my pleasure 16:53:41 thanks, to all of you, for helping us get a plan for YVR 16:53:52 want to make those sessions as productive as possible 16:54:31 alright, well, I'll give everyone 5 minutes back 16:54:36 see you all in Vancouver! 16:55:05 thanks for joining today 16:55:10 #endmeeting