19:02:48 #startmeeting infra 19:02:49 Meeting started Tue Nov 26 19:02:48 2013 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:53 The meeting name has been set to 'infra' 19:02:57 #topic Actions from last meeting 19:03:12 this should shock no one 19:03:15 #action jeblair file bug about cleaning up gerrit-trigger-plugin 19:03:33 * fungi feigns shock 19:03:46 #action jeblair move tarballs.o.o and include 50gb space for heat/trove images 19:03:56 pleia2: how do you want to track the work on historical publications? 19:04:30 pleia2: want to just open a bug about it? 19:04:35 jeblair: sounds good 19:04:48 last week was kind of chaos, so I didn't ask anyone to help me get started 19:04:49 funzo: how are static volumes? 19:04:52 bit of a snag here... 19:05:02 #link http://www.rackspace.com/knowledge_center/product-faq/cloud-block-storage 19:05:10 "What's the maximum number of Cloud Block Storage volumes I can attach to a single server instance?" 19:05:16 fungi: gee, sorry about your nick there. 19:05:17 "You may have up to 14 Cloud Block Storage volumes attached to a single Server." 19:05:26 hahaha 19:05:29 i like funzo 19:05:36 could be my new friday nick 19:05:37 yay cloud? 19:05:52 so anyway, yeah. suggestions? slowly migrate some pvs from 0.5t to 1t? 19:06:04 so we're at 5.5T out of a possible 14T 19:06:07 right? 19:06:12 thats the best thing I can come up with 19:06:13 i can shift data from smaller to larger cinder volumes and replace them until we have enough 19:06:19 fungi: i think that's the way to go 19:06:31 not a possible 14t either, no. see the faq ;) 19:06:39 I think that's a technical limitation. 19:06:41 "The limit for Volumes and storage is 10 TB total stored or 50 volumes per region (whichever is reached first)." 19:06:58 also, jog0 and sdague merged a patch thursday that should _greatly_ reduce the log size 19:07:02 soren: yes, i've seen a lot of references to xen being unable to present more than 16 block devices into a domu 19:07:09 Up to 8 partitions per disk, up to 16 disks, where two are assigned by Nova by default. 19:07:10 at least older xen versions 19:07:11 jog0: sdague: have a link to that change? 19:07:27 ...and that's all you get with 256 minor numbers to choose from. 19:07:34 clarkb: it cleaned up the isomumblemumble log spam 19:07:42 Uh.. 19:07:45 clarkb: something like 10G -> 2G 19:07:46 jeblair: nice 19:07:53 so it's possible this might put more fuel on the 'figure out swift' fire? 19:07:54 Unless, of course you know how to do basic arithmetic. 19:08:05 * soren shuts up now before he makes more of an arse of himself. 19:08:06 mordred: i'm not sure that fire needs more fuel 19:08:13 well, right :) 19:08:36 * mordred adds fuel to exploding gas bombs... 19:08:41 so anyway, i'm happy to wiggle a few pvs this week, just wanted to make sure there weren't better options to get us short-term gains with minimal additional effort before i pressed ahead on that 19:08:49 jhesketh is not here, but i hope to get together with him soon to find out if he's planning on working on that and supply him with any help needed 19:09:01 fungi: i think that's the best thing to do now 19:09:06 #action fungi grow logs and docs-draft volumes so they're 50% full 19:09:11 and hopefully we'll get on swift before it becomes an issue 19:09:20 clarkb: https://review.openstack.org/#/c/56471/ 19:09:30 #link https://review.openstack.org/#/c/56471/ 19:09:35 that's the review on the logs 19:09:39 and if we don't quite make that, we can start ratcheting down the logs we keep (1 year to 9 months, for instance) 19:09:48 jeblair: fungi ++ good intermediate fix 19:10:02 jeblair: ++ 19:10:07 jeblair: don't we already only keep 6 months of logs? 19:10:16 fungi: maybe update the docs to say to only add 1TB volumes 19:10:24 fungi: yeah 19:10:27 jeblair: will do 19:10:30 fungi: i thought it was 2 releases or 1 year. i might be wrong. 19:10:39 it's 6 months, one release 19:10:41 #action fungi update docs for static to recommend 1t cinder volumes 19:10:47 nope i'm wrong. 19:10:51 -type f -mtime +183 -name \*.gz -execdir rm \{\} \; \ 19:10:57 yeah, it's a metric crapton of logs 19:11:06 so, er, 4 months then, i guess. yeesh. 19:11:10 anyway, we've beaten this item to death for now 19:11:12 hopefully we won't need that. 19:11:22 #topic Trove testing (mordred, hub_cap) 19:11:33 mordred, hub_cap: what's the latest? 19:11:43 jeblair: I have done nothing on this in the last week - hub_cap anything on your end? 19:13:04 hub_cap: also, I'm a bit cranky that you have someone working on turning on your non-tempest tests when you have not finished getting tempest up and going 19:13:22 wow 19:14:38 hub_cap: in fact, I'm sorry my brain didn't fire on this properly the other day - I believe we should -2 any changes from you that add support for your other thing until you've got tempest wired up 19:14:50 mordred: is there such a change? 19:15:01 hey im ok w/ that. i just figured you'd wanted both and id rather he do the legacy tests 19:15:04 jeblair: we told them how to do it a day or two ago 19:15:17 we want everythig - but you need tempest tests to be integrated 19:15:21 so you sohuld really get those going 19:15:26 i didnt know there was an order 19:15:28 and I want your legacy tests deleted 19:15:33 yes mordred i agree 19:15:33 because O M G 19:15:46 i know..... i know :) 19:15:48 so, let's get your tempest stuff wired up, _then_ we can get your additional things added 19:15:51 k? 19:15:52 kk 19:16:10 good by me! 19:16:11 hub_cap, mordred: when do you think that might happen? 19:16:32 hub_cap: what are we blocking on for that for you right now? anything on my end? 19:16:42 nothing is blocking other than me not doing the work 19:16:44 thats the blocker heh 19:16:53 great. I'll start poking you more then 19:16:56 dirty 19:17:16 :) 19:17:23 #action mordred to harrass hub_cap until he's writen the tempest patches 19:17:53 #topic Tripleo testing (lifeless, pleia2) 19:18:09 hi 19:18:29 continuing to patch tripleo for the multi test environments, and working on networking now 19:18:30 uhm, we just did this in #openstack-meeting-alt 19:18:38 I wonder if we can dedup the topics somehow 19:18:41 indeed we did! 19:18:50 for more, see tripleo meeting :) 19:19:19 okay, i'm not sure we need that detail 19:19:26 pleia2: can you relate this to infra? 19:19:41 no updates for infra at this time 19:20:09 okay. so 'still hacking on test-running-infra' is more or less the status 19:20:13 yep 19:20:27 progress is being made 19:20:33 visibly so 19:20:58 ok 19:21:13 #topic Savanna testing (SergeyLukjanov) 19:21:31 working on setting up jobs for d-g 19:21:33 I've seen patches for this 19:21:36 so i think we basically just told SergeyLukjanov to wait just a bit for the savanna devstack tests 19:21:42 which is an unusual thing for us to do! 19:21:47 jeblair: yay us! 19:22:02 but clarkb is doing a d-g refactor, and we want to get the savanna jobs into that refactor 19:22:09 I'll rebase my change on clarkb one 19:22:11 but there is a good reason for that. I am reasonably happy with how the d-g job refactor turned out and want people building on top of that 19:22:11 so hopefully it shouldn't be long 19:22:22 i look forward to reviewing it! :) 19:22:47 and I'm already have some draft code of api tests for savanna 19:23:04 and hope to make a patch later this weel to tempest 19:23:12 later this week* 19:23:33 SergeyLukjanov: excellent! sdague ^ 19:23:46 cool 19:24:12 #topic Goodbye Folsom (ttx, clarkb, fungi) 19:24:19 heyhey 19:24:23 its gone ! 19:24:31 woohoo 19:24:33 i bring it up again because we linked to this review last week: https://review.openstack.org/#/c/57066/ 19:24:36 yep, folsom is gone, havana tests are all implemented now 19:24:41 and it's abandoned due to -1 19:24:47 oh 19:24:56 so i wanted to check: is the grenade situation straightened out, or do we have work to do still? 19:25:10 it's still in process 19:25:17 i think there is more to do. dprince? 19:25:41 ahh, right. sdague, you said there was still some grenade code support missing for that change? 19:26:00 yeh, the first one is maybe merging today 19:26:34 sdague, just to clarify - is it ok to start with only /smth api tests and client? 19:26:44 sdague, I mean with only one 'endpoint' 19:27:06 #link https://review.openstack.org/#/c/57066/ 19:27:18 sdague: can you link the change you're referring to? 19:27:22 SergeyLukjanov: yeh, I think so, but it will take looking at the patch when it comes in to be sure 19:27:58 sdague, sure, thx 19:28:07 jeblair: https://review.openstack.org/#/c/57744/ 19:28:12 #link https://review.openstack.org/#/c/57744/ 19:28:44 fungi: sorry, too many meetings, let me catch up here... 19:29:37 yeeha 19:30:20 fungi: https://review.openstack.org/#/c/57066/ 19:30:38 fungi: I was waiting on some other grenade core fixes first though. 19:30:44 dprince: right, that was the question. sdague caught us up 19:31:08 so i think we're cool on this topic for the moment? 19:31:18 fungi: Cool. FWIW this stuff is actually blocking parts of a Nova patch series for me. So I'll know when its done! 19:31:22 sounds like it 19:32:08 #topic Jenkins 1.540 upgrade (zaro, clarkb) 19:32:31 i think the change to get nodepool up is in progress 19:32:42 yup zaro pushed that 19:32:50 so let's check back in on this later 19:32:57 #topic New devstack job requirements (clarkb) 19:33:11 clarkb: you have a change up, yeah? 19:33:14 I do 19:33:31 #link https://review.openstack.org/#/c/58370/ 19:33:36 when i find changes to devstack-gate.yaml, i've been suggesting they coordinate with you 19:33:53 except if they are for non-official projects, in which case i think we want those jobs in a different file 19:34:03 jeblair: thank you ti has been helpful, I have been leaving a long comment on those changes linking back to 58370 with details 19:34:05 (and in that case, i don't think they need to consider this refactor) 19:34:22 jeblair: right I think they can continue to abuse devstack for whatever erason 19:35:03 58370 is handy because it clearly shows how to write devstack gate job templates that are useful in all the places we want to use them. check, gate, check on stable branch for d-g changes, and periodic bitrot jobs for releases 19:35:17 and we cover all of these bases with two templates per logical job. Overall I am pretty happy with it 19:35:36 also WSME/Pecan can overload branch-designator to have special jobs just for them that are otherwise identical to the gate jobs 19:35:54 so this will help us integrate the world 19:36:21 clarkb: how does that work? (i haven't looked at the change) what would they set, for example? 19:37:21 jeblair: the tail end of the jobs names has -{branch-designator} in it. I have been using that to distinguish between stable-grizzly and stable-havana and master for periodic jobs and use -default for things that run against proposed changes on the proposed branch 19:37:42 got it 19:37:47 jeblair: WSME/pecan could put wsme-default in that var instead and get a new job that zuul won't put in the gate with everyone else that is otehrwise identical 19:38:18 #topic Jenkins Job Builder Release (zaro, clarkb) 19:38:26 #link https://pypi.python.org/pypi/jenkins-job-builder/0.6.0 19:38:28 exists ^ 19:38:39 oh cool so that got done ++ for doing that 19:38:44 cut maybe an hour ago 19:38:57 #topic Puppetboard (anteaya, Hunner, pleia2) 19:38:57 I can update the bug that asked us to cut a release if that hasn't been done yet 19:39:01 clarkb: ++ 19:39:05 o/ 19:39:18 so last week Hunner gave anteaya and I an internal demo of puppetboard 19:39:23 clarkb: and probably need to switch other outstanding bugs from committed to released too 19:39:34 it's pretty cool, has published logs from servers, basic stats 19:39:49 faster than dashboard, but does require use of puppetdb (which we don't currently use) 19:39:56 I have code, but still working out the apache manifest stuff (it's an older module version that my prototype used). So not yet pushed to review 19:40:13 did we have any other requirements? 19:40:19 where does puppetdb run? 19:40:23 Puppetdb is essentially swap out the puppet.conf lines that post to the dashboard, and add it the lines for sending to the puppetdb 19:40:25 puppet master 19:40:29 (right?) 19:40:40 Currently the puppetdb would be running on the puppetboard box, since it's kind of related 19:40:58 okay, i like keeping the master simple. 19:41:02 Currently masters -> dashboard; in the future masters -> puppetdb -> puppetboard 19:41:07 that makes the most sense to me. it's not privileged in any way, right? 19:41:08 Yeah, low impact to masters 19:41:17 It's not privileged, no 19:41:26 yeah, better on the puppetboard system then 19:41:50 Hunner: i see that it stores facts and catalogs from each node; does that include hieradata? 19:42:02 it does run on postgres (which may or may not be a problem) 19:42:13 One point to note is: just like mysql running on the dashboard box, postgresql will run on the puppetdb/puppetboard box 19:42:16 Hunner: specifically, i'm wondering if this puts plaintext passwords on more systems. 19:42:30 jeblair: It does not store hieradata; just the facts, reports, and compiled catalogs 19:42:38 Hunner: great 19:42:42 Think of hieradata like manifests... neither of those are outputs 19:42:47 Only inputs 19:42:57 puppetdb stores the outputs 19:43:04 (facts, catalogs, reports) 19:43:08 *nod* 19:43:18 And is HTTP REST API queryable 19:43:35 But that's extra shiny that you don't need to care about 19:43:59 i can see caring about it down the road. being able to query system status for other things could be very, very helpful 19:44:21 It's essentially trying to be a centalized "best guess" at the whole infrastructure 19:44:32 So yeah, useful down the road for what you say 19:45:03 But as a gui report server ("What's failing? Oh...") it works great 19:45:32 the postgres thing is kind of a bummer, since we're trying to move to cloud databases (which are only mysql right now), but i think we can live with it. 19:46:08 so what's next? wait for Hunner to finish apache manifests? 19:46:24 Yep. Hopefully have a review by next meeting 19:46:52 pleia2: any other questions? 19:46:57 Hunner: thank you very much! 19:47:03 that's it from me 19:47:10 i take it postgres is a hard requirement (substring search or other non-mysql feature) 19:47:10 thanks Hunner 19:47:34 fungi: At this time yes. I could ask for the details and share, since I'm curious too 19:47:42 cool. thanks! 19:47:55 #topic Multi-node testing (mestery) 19:47:55 The backend is actually swappable, though the only two existing backends are postgres and in-memory 19:48:05 hi 19:48:32 So, anteaya set me up with this slot to discuss this, mostly wanted to ask some questions and get some direction around this. 19:48:55 gah. had network glich. someone talk to me about hard reqs for postgres at some point please 19:49:22 Hunner: maybe you can pass on what you find to mordred 19:49:30 mestery: what are your questions? 19:49:46 For the Neutron ML2 plugin, we would like to do gate testing in a multi-VM environment to test out the tunneling capabilites and new drivers in ML2. 19:49:55 So a) is that possible today (multi-node testing in the gate)? 19:50:09 Hunner: yeah - grab me offline and let's chat about that - I'll try not to troll you tooo much 19:50:50 mestery: we're working on some solutions aroud that in the tripleo testing workstream 19:51:03 mestery: a) it's not possible today 19:51:06 mordred: Great! 19:51:24 a) it's not ready yet - but somethign tells me that we'll have to cook up the same things to do it outside of that workstream, so perhaps you guys could lend a hand to that if you have some bandwidth 19:51:37 mordred: Perfect, I think that would be good! 19:51:42 mordred: i'm hesitant to suggest that as a solution; i'm not sure it will suffice 19:51:42 We have some ML2 folks who could help with this. 19:51:57 mestery: i take it it's not feasible to test out tunneling entirely within a single devstack install on one machine 19:52:15 mordred: afaik, the multi-node tripleo work is focused on a limited set of multi-node hardware environments 19:52:23 (i.e. each network device as a vm in devstack) 19:52:28 jeblair: that is a good point 19:52:29 fungi: We can test tunneling perhaps, but we need multiple nodes to run agents on each node for testing as well. 19:52:47 fungi: The thing we want to test is the agent communication as well and that path. 19:53:07 mordred: and at the moment, the work the tripleo folks are doing is not intended to be reusable in infra 19:53:15 in this case "nodes" means more than one neutron controller instance? 19:53:18 I apologize, I'm still learning a lot of this infra code as well, so please bear with me. :) 19:53:45 mestery: dprince, lifeless, derekh and I are having a status chat on Google+ about our work with tripleo testing in a bit (maybe right after this meeting? I can ping you), you're welcome to be a fly on the wall if you want to see where we are at 19:53:59 fungi: No, one control node, multiple compute nodes. 19:54:17 pleia2: any chance you guys could open that up a bit? 19:54:19 i wonder if we could set up more than one nova service on one devstack machine 19:54:25 jeblair: oh yeah, anyone is welcome 19:54:27 pleia2: we don't use g+ for openstack development 19:54:31 pleia2: Cool! Shoot me the info, if I can make it I will join. 19:54:45 jeblair: oh, that, maybe we should use the asterisk 19:55:27 mestery: i'd like to get you some straightforward answers to your question 19:55:42 jeblair: So it sounds like it's not supported, and it will take some work to make it happen? 19:56:06 mestery: please feel free to check out what the tripleo folks are doing, but i do not want you to be misled into thinking that is the shortest or most certain path to multi-node gate testing. 19:56:19 jeblair: Understood, I'll do that. Thank you! 19:56:40 and it's also worth exploring whether multi-node testing (in the sense we've been discussing) is necessary for what you want to test too 19:56:42 mestery: for the at-scale testing we do on virtual machines in the gate, we do need a multi-node solution 19:56:59 jeblair: OK. 19:57:16 fungi: It is necessary, because we need multiple Neutron nodes, 1 control node and one with neutron agents on it. 19:57:25 mestery: however, it's a few steps away, and probably won't be available for a while. we need to have non-jenkins test workers running in order for that, and there are still some things to do before we can get even to that point. 19:57:49 For multi-node testing with virtual machine groups, there is rspec-system or its rewritten counterpart beaker-rspec. At least that's what we use (if I understand your requirements) 19:57:51 jeblair: Understood. This came up in our Neutron ML2 meeting last week, thus my interest in talking to everyone here. 19:58:09 * ijw sneaks out of the woodwork 19:58:37 Hunner: the problem here is we use public cloud resources that intentionally hobble the networking things we can get away with 19:58:40 I've run VM groups for Openstack testing by starting a VM that, in turn, installs and starts others. I wasn't using devstack, I was using something stackforge-based, but there's no reason why devstack wouldn't also work. 19:59:05 and we are trying to test networking things 19:59:19 Sounds good. Carry on 19:59:58 ijw: Thanks, will check that out. Some things to think about here, thanks everyone. 20:00:32 the problem statement for us is more like: we have a pool of nodes that are all online and connected to zuul -- how do we get a collection of those running a single job. 20:00:41 we brainstormed about that a bit at the summit and have some ideas 20:00:49 but regardless, they are still a few steps away 20:00:54 and our time is up 20:01:02 so thanks everyone, and sorry about the topics we didn't get to 20:01:12 if there's something urgent, ping in -infra 20:01:14 #endmeeting