14:00:11 #startmeeting tripleo 14:00:15 Meeting started Tue Apr 12 14:00:11 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:18 The meeting name has been set to 'tripleo' 14:00:27 #topic rollcall 14:00:29 o/ 14:00:31 Hi all! 14:00:33 o/ 14:00:34 hi 14:00:35 o/ 14:00:42 * derekh lurks while working on rh1 cloud 14:01:09 o/ 14:01:14 o/ 14:01:17 o/ 14:01:51 o/ 14:01:51 Ok then, lets get started 14:01:53 o/ 14:02:00 #topic agenda 14:02:00 * one off agenda items 14:02:00 * bugs 14:02:00 * Projects releases or stable backports 14:02:00 * CI 14:02:03 * Specs 14:02:15 * Open disussion 14:02:20 I made one minor change: 14:02:30 #link https://wiki.openstack.org/wiki/Meetings/TripleO#One-off_agenda_items 14:02:40 I moved one-off items to the start, as we kept running out of time 14:03:00 shardy: good idea 14:03:03 I propose we keep them time-boxed at say 5mins each max, and move them to open-discussion if they run over, sound reasonable? 14:03:15 yep 14:03:31 Cool, so there are two this week 14:03:37 #topic one off agenda items 14:03:48 trown: want to give us an update on tripleo-quickstart? 14:03:57 I'm not sure if those are from last week or not 14:03:58 shardy: sure 14:04:07 they are from last week 14:04:23 tripleo-quickstart code is imported, and third-party CI jobs are running 14:04:47 still need to move over github issues and make some minor CI fixes 14:05:02 trown: excellent, sounds like good progress :) 14:05:02 shardy: yeah, so I usually clean up that wiki right before the meeting :). And this week I didn't... 14:05:25 dprince: hehe, I'll do it right after we finish :) 14:05:37 Ok, one other one-off item is summit topics: 14:05:40 documentation is a bit blocked by a lack of an image for upstream, but that is a pretty big topic 14:05:49 #link https://etherpad.openstack.org/p/newton-tripleo-sessions 14:06:12 trown: ack - is that something we can follow up on on the ML or does it need disscussion now? 14:06:36 I think ML, plus maybe CI subteam meeting if/when it is an official thing 14:06:43 trown: +1, thanks 14:06:56 Ok so I refactored all the session proposals 14:07:12 and other than the TLS one I managed to capture all the ideas, with some combined into sessions 14:07:18 it looks a lot like last summit really 14:07:30 anyone have any final comments or objections to those? 14:07:38 I need to propose them to the schedule this week 14:08:13 shardy: seems good to me 14:08:44 Ok, any issues let me know otherwise I'll propose them later today or early tomorrow, thanks! 14:08:46 it looks excellent 14:09:09 #topic bugs 14:09:25 * beagles wanders in late 14:10:11 Anyone have any specific bugs to mention? 14:10:33 I see a number of CI related ones, and I'm aware we've got CI issues generally, but anything else to highlight? 14:10:41 we got two bugs preventing us from moving the current-tripleo pin https://bugs.launchpad.net/oslo.config/+bug/1568820 and 14:10:42 Launchpad bug 1568820 in OpenStack Compute (nova) "Duplicate sections in generated config" [High,Confirmed] 14:11:03 derekh: aha, that was going to be my next question :) 14:11:35 why don't we purge nova.conf before puppet run? 14:11:36 I seem to have lost the other bug, anyways its a mistral thing 14:11:45 I think puppet-nova has support for that 14:11:55 EmilienM: oh, that is a good idea 14:11:58 EmilienM: we could, and it works for the undercloud, 14:12:14 EmilienM: I've tried it for the overcloud, but I don't think I did it correctly 14:12:16 EmilienM: we might check to make sure there isn't also a packaging issue 14:12:22 dprince: its not 14:12:34 EmilienM: purging via puppet would be fine too... but might cover over the actual bug 14:12:41 derekh: okay 14:12:41 dprince: oslo-config-generator is gerating duplicate sections 14:12:49 we do it in glance https://github.com/openstack/puppet-glance/blob/5d6e42356efb79e62bf3f1f464a444be39b2dca4/manifests/registry.pp#L116-L119 14:12:53 I can submit a patch in nova 14:13:05 #action EmilienM to patch puppet-nova to add support for nova.conf purging and patch tripleo to enable it 14:13:09 derekh: Is there a bug open for that? 14:13:21 bnemec: https://bugs.launchpad.net/oslo.config/+bug/1568820 14:13:22 Launchpad bug 1568820 in OpenStack Compute (nova) "Duplicate sections in generated config" [High,Confirmed] 14:13:22 derekh: okay, so oslo-config-generator is causing a it to be packaged with duplicates then? 14:13:29 derekh: Thanks 14:13:45 dprince: yup, and that seems to confuse the puppet-nova module 14:14:20 derekh: cool. lets fix that then rather than turn on puppet-nova config purging 14:14:25 bnemec: The fix I have up probably isn't suitable, been trying to work on a better solution but got draged off to something else 14:15:09 anyways, that all from me, things in the ways of moving the tripleo pin 14:15:21 Ok, sounds like this will require further discussion after we get CI back up, thanks for the update derekh 14:15:43 #topic Projects releases or stable backports 14:15:56 dprince: ok so no need to patch puppet-nova? 14:16:12 So a couple of updates - we branched puppet-tripleo yesterday as that missed the initial stable/mitaka branching 14:16:45 and gfidente updated the wiki with a revised release process that uses openstack/releases to push the tags & announce the release to the ML 14:16:56 EmilienM: don't think so 14:17:00 #link https://github.com/openstack/releases 14:17:12 #link https://review.openstack.org/#/c/303986/ 14:17:27 #link https://wiki.openstack.org/wiki/TripleO/ReleaseManagement#How_to_make_a_release 14:18:32 That's the first step towards aligning with the release process changes discussed here (for all projects): 14:18:37 #link http://lists.openstack.org/pipermail/openstack-dev/2016-March/090737.html 14:19:18 So anyone can now propose a release, and we can potentially look at doing interim milestone releases too via a similar method 14:20:27 Anyone have any comments on that, or other release/backport content? 14:21:38 Ok then 14:21:40 #topic CI 14:21:49 * shardy dons tin-foil hat 14:21:50 rh1 down as of thismoreing 14:22:01 currently working on it 14:22:21 derekh: any info to share re the cause yet? 14:22:23 besides that, jobs a re running too ling and hitting timeouts 14:22:45 derekh, yes, we encounter more infra issues than usual, I prepaired some statistics for today: 14:22:45 #link https://etherpad.openstack.org/p/tripleo-issues-analysis 14:22:57 sshnaidm, very nice! 14:23:13 shardy: nope, this is the 3 or 4 time this exact thing has happened (in 2 years), 100,000s of arp requests flying around the network 14:23:36 shardy: the only way I've ever figured out to deal with it has been a reboot of everything 14:24:21 https://media.giphy.com/media/F7yLXA5fJ5sLC/giphy.gif 14:24:22 sshnaidm: thanks, I guess everything will fail today, but when the cloud is back up this sort of analysis will be useful 14:24:30 shardy: the cahing work I've been doing is block on getthing the tripleo pin moved 14:24:35 jdob: +1! 14:24:38 derekh: ack, thanks 14:24:56 sshnaidm: yup, we got lots of roblems ;-( 14:25:00 one more thing 14:25:23 ... panic ... 14:25:35 I *think* the dns errors we have seen in jobs a few times coincide with nodepool being restarted 14:25:59 that should be ok, but it hits us with lots of new instance requests at the same time 14:26:06 and kind of DOS's us 14:26:13 I think thats what happens anyways 14:26:22 derekh: interesting 14:26:39 re the ram usage, I've been doing some profiling trying to figure out things we can do to reduce it 14:26:44 that seems like another argument for being totally third-party 14:27:00 so far I've got a patch up which disables ceilometer/aodh because we don't use them, that saves over 300M 14:27:09 trown: it would 14:27:12 shardy: nice 14:27:53 There's also a lot of processes spawning multiple workers on the undercloud which we may be able to trim down, like we have for the overcloud 14:28:00 shardy: just make those composable roles? 14:28:14 dprince: composable undercloud? 14:28:19 I'm also attempting to profile inside heat as that's one of the worst offenders (along with mariadb) 14:28:52 We probably have mariadb tuned for real deployments, which isn't ideal for a memory-limited CI environment. 14:28:53 trown: that too, the idea of using Heat again for the undercloud is worthy of getting back to 14:28:55 dprince: Yeah, well initially I guess it'll be some conditionals in the manifest, but we could look at using the same service profiles from puppet-tripleo perhaps? 14:29:10 any ideas why the overcloud deploy takes > 1hr ? lots of stops to sync up? 14:29:12 bnemec: Yeah, I was wondering if we should add a "minimal" option to undercloud install 14:29:16 trown: it was one of TripleO's foundation ideas... i.e. the feedback between the over and undercloud 14:29:17 for HA 14:29:23 that turns on a different config tuned for minimal footprint 14:30:12 derekh: the overcloud deploy takes 10mins for me locally, so I suspect it's performance of the platform 14:30:30 shardy +1 14:30:33 derekh: at least some of the time it is due to the nova-ironic race, which we have retries to deal with, but retries are time-expensive 14:30:33 shardy: Yeah, or we use slagle's custom hieradata to pass in a CI-specific config. I feel a little weird adding user-facing options for CI only. 14:30:34 shardy: for the HA overcloud ? 14:31:06 a 10 minute deploy? is there a --turbo option? :) 14:31:13 :-) 14:31:13 overcloud only 14:31:24 I'm up to 20 minutes locally these days. 14:31:26 but yes that's it for me too 14:31:28 bnemec: Yeah, I was thinking the same, but it'd be good to offer the option for developers too I think 14:31:45 bnemec: This is for a 2-node nonha deployment on a box with an SSD 14:31:47 shardy: Maybe put something in tripleo.sh so dev environments get that by default. 14:32:00 it does take 20mins on my other (slower non-SSD) box 14:32:02 shardy: Yeah, I'm SSD-backed too. It's still slow. 14:32:25 honestly for me that is just about using 2cores and 8g per node 14:32:29 bnemec: sure, we can start with tripleo.sh and prove out what options we need there I guess 14:33:01 shardy: Yeah, we can figure something out. 14:33:26 couldnt we get rid of nova-conductor on the uc too? 14:33:42 slagle: I tried. I don't think it is possible anymore 14:33:55 slagle: nova requires it now I think :/ 14:33:59 how can we get rid of nova conductor? 14:34:01 oh ok 14:34:01 :( 14:34:12 slagle: we already have a custom OS::Nova::Server resource in tripleo-common, so I wonder if we actually need nova at all 14:34:27 that's more of a long term discussion tho ;) 14:34:40 that would save some resources :) 14:35:04 shardy: exactly, but I think we'd want Ironic heat resources too then 14:35:27 dprince: Yeah, how to wire it up definitely needs more thought 14:35:40 getting rid of nova in the UC is a worthy investigion 14:36:14 shardy: I've been keen on droping Nova in the UC for a while now. But we've actually added more features around Nova, not less :/ 14:36:34 dprince: Not really, we've got a custom nova resource and a custom nova scheduler 14:36:50 shardy: like some of the scheduler related things in t-h-t 14:37:08 the scheduler filter thing could be easily reimplemented by filtering an ironic node list 14:37:19 dprince: that actually didn't work, bnemec had to write a custom filter 14:37:29 shardy: perhaps so, and I'd like to see it work out 14:37:30 so again, nova isn't actually buying us much there 14:37:46 anyway, shall we move on and table this for the beer-track at summit? :) 14:38:02 shardy: the reason for nova is multi-tenant use cases 14:38:08 I'll try to carve out some time to revive my ironic resources and look at how it might be done 14:38:16 shardy: like Magnum would require it for example 14:38:44 shardy: if we've got no plans for multi-tenant undercloud cases or to use projects that require that then I think we can seriously consider dropping it 14:40:21 dprince: Yeah, there's a bunch more complexity than nova around that tho right - I'm not sure we'd want to support Ironic in those environments anyway unless all the separation around baremetal to tenant are worked out? 14:41:13 #topic specs 14:41:20 shardy: last summit I think there was a lot of focus around multi-tenant baremetal clouds that use nova... 14:41:51 dprince: ack, something to discuss further then I guess - I'm just wondering if we have to *always* require it 14:42:11 e.g can we support a "tripleo lite" mode 14:42:40 shardy: for me the key would be to get Heat to support OS::Nova::Server, similar to slagle's patch but without the extra work 14:43:02 shardy: if that is even possible, Just refining the interface there a bit so it seems cleaner 14:43:25 all things to consider 14:43:28 I would not be a fan of trying to support both Nova and not-Nova. 14:44:31 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:45:12 So on the topic of specs - I'm planning to create some series in launchpad (like other projects) so we can track features on the roadmap for newton 14:45:39 folks have started asking about it already, and it'll be easier to track if we can tag either spec-lite bugs or blueprints against a "newton" series 14:45:44 is that OK with folks? 14:45:51 +1 14:46:04 +1 14:46:20 +1 14:46:31 the only effort is ensuring you raise either a bug or blueprint and propose it for the series, then we know it's targetted to Newton 14:46:33 +1 14:47:10 I'd say specs should be optional particularly for simpler features, as they always get bogged down in implementation discussions 14:47:26 obviously for more complex stuff specs may be posted too :) 14:48:40 Relatedly it'd be good to consider making some milestone releases this cycle (again like other projects), mostly so we give folks better visibilty of the release cycle as it passes 14:49:39 Anyway, we can discuss that further on the ML, just getting the idea out there for consideration 14:50:00 #topic Open Discussion 14:50:18 Anyone have anything to discuss? 14:50:21 I'd like to get your opinions about tempest support patch: https://review.openstack.org/#/c/295844/ 14:51:15 sshnaidm: We simply don't have the time budget to run it every commit - are you thinking of the periodic job? 14:51:19 sshnaidm: idea seems fine to me. Just keep in mind that we aren't even close to a point where running that for all the CI jobs is appropriate 14:51:25 I am -1 to adding anything that increases job time 14:51:29 shadower, yes, it should go to periodic 14:51:34 sshnaidm: as a general tool/feature for tripleo.sh it is fine I think 14:51:58 dprince, sure, I plan to start with periodic nonha, and then we'll see 14:52:04 periodic would be fine though 14:52:05 is there any wip to move it in tempest itself? 14:52:23 one new topic I'd like to mention is not landing any more t-h-t features that aren't in the composable roles format 14:52:40 config_tempest.py sounds like a script not specific to tripleo, but that could be use widely in OpenStack 14:52:50 now that we have the keystone example I think that should be sufficient to convert over to the new format. for controller services... 14:52:58 dprince: +1, gnocchi was the last one because it was agreed as a backport exception for mitaka 14:52:59 dprince: ++ 14:53:12 nice 14:53:28 are we still committed to testing stack-updates before landing composable roles? 14:53:32 sshnaidm: see my question ^ 14:53:40 EmilienM, dmellado is working on tempest configuration script in upstream, but not close to finish yet 14:53:54 slagle: we have an upgrades job 14:54:07 dprince: it doesnt test anything 14:54:07 EmilienM, he promises to present it in summit 14:54:11 slagle: understood that isn't where we'd like it to be but that is the bar I think 14:54:24 What is the status of that, it does an update, but not the full version-to-version upgrade right? 14:54:28 slagle: we can't wait on this any further I think 14:54:36 ok, i'm just asking 14:54:39 shardy: wait, won't we run tempest at each commit? 14:54:44 EmilienM: No 14:54:47 example of tempest run is here: https://review.openstack.org/#/c/297038/16 - but just an example 14:55:04 I think that's a mistake, we should run at least some basic tests. But that's my opinion 14:55:12 EmilienM, I plan to start with nonha periodic, because it's time consuming now 14:55:13 EmilienM: we don't have time unless we can somehow reduce our CI runtime by ~20mins 14:55:24 yes, time is bottleneck now 14:55:29 EmilienM: if we can reduce the runtime, we can consider adding it 14:55:30 dprince: we had been in agreement on this, but i had the feeling we werent any longer, so would just like to clarify 14:55:44 shardy: but what if we drop pingtest and run tempest/smoke instead? 14:55:55 tempest/smoke has 2 scenarios that spawn VM and ssh to it 14:56:14 EmilienM: tempest isn't our most important issue in all our CI jobs I think. Until the walltime comes down signiicantly I'd like the talk of tempest to go away I think 14:56:14 EmilienM: we'll have to compare the coverage - IMO pingtest covers things not covered at all by tempest 14:56:26 EmilienM: ping test takes 3 minutes right now, not 20. 14:56:48 At least last I checked after the cirros change merged. 14:57:04 bnemec: yep it is back down to super speedy 14:57:09 mhh ok, everyone seems happy with pingtest 14:57:11 tempest also has zero functional coverage of heat, which is covered by pingtest 14:57:17 tempest smoke is taking 20+ minutes in RDO 14:57:32 EmilienM: everyone wants to see better coverage, but half our CI jobs are getting killed by the infra timeout 14:57:33 * EmilienM stop arging 14:57:39 shardy: +1 14:57:41 we have to fix that problem first 14:57:42 EmilienM: I wouldn't say we are happy with it. We just can't spare any extra time 14:58:01 It's not that I don't want Tempest, but we aren't in a place where it's practical yet. Unfortunately. 14:58:12 starting with the periodic jobs seems like the most workable compromise 14:58:16 slagle: lets revisit the talk about how to improve the upgrades job in #tripleo perhaps 14:58:19 is getting beefier HW an option for CI? 14:58:28 ya I am pro putting it on the periodic job 14:58:30 bandini: You buying? :-) 14:58:34 slagle: everything, not just composability would benefit from the upgrades job being better 14:58:40 bandini: there are various optins under discussion 14:58:46 bnemec: erm I left my wallet at home :P 14:59:00 2mins - anything else before we wrap up? 14:59:08 dprince: sure 14:59:20 shardy: I hope something comes out of it, because it seems such an important problem atm 14:59:30 * EmilienM jumps in puppet meeting 14:59:39 Ok, thanks all! 14:59:43 #endmeeting