14:00:17 #startmeeting tripleo 14:00:18 Meeting started Tue Jul 26 14:00:17 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:22 The meeting name has been set to 'tripleo' 14:00:25 #topic rollcall 14:00:28 0/ 14:00:30 o/ 14:00:31 o/ 14:00:31 o/ 14:00:33 o/ 14:00:34 Hey all, who's around? 14:00:35 o/ 14:00:40 o/ 14:00:41 ¬_¬ 14:00:43 o/ 14:00:55 o/ 14:00:56 o/ 14:00:57 o/ 14:01:09 o/ 14:01:12 o/ 14:01:14 hi 14:01:23 \o/ 14:01:28 #link https://wiki.openstack.org/wiki/Meetings/TripleO 14:01:36 #topic agenda 14:01:49 * one off agenda items 14:01:49 * bugs 14:01:49 * Projects releases or stable backports 14:01:49 * CI 14:01:49 * Specs 14:01:51 * open discussion 14:02:07 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:02:20 If anyone has any one-off items, feel free to add them to the etherpad 14:02:35 So far we've got two 14:02:54 #topic One-off items 14:03:06 ccamacho: would you like to start? 14:03:13 Sure 14:03:29 Hi All 14:03:53 Working with composable mistral Im getting some random mem issues, so the questions is if those new services should be 14:04:00 So I agree we need to stop the memory usage creep 14:04:02 added enabled by default or not 14:04:14 one of the nice things about composable services is that we can now add things disabled by default 14:04:29 there are a couple of issues though 14:04:32 the thing is those are not enabled by default, how should we test that at least they are being installed properly? 14:04:39 1. How to ensure CI coverage of disabled-by-default things 14:04:43 ccamacho, fwiw the trove submission just passed but is in same situation https://review.openstack.org/#/c/233240/ , we need to understand if we want to enable it by default or not 14:04:43 yeahp 14:04:46 2. How to allow users to easily enable them 14:05:01 (2) will be helped by https://review.openstack.org/#/c/330414/ which ramishra is working on 14:05:15 we might want to start different scenarios deployment 14:05:22 but for now we've been hacking in OS::Heat::None definitions to disable things in e.g ControllerServices 14:05:28 we could reuse multinode job and have different environments 14:05:33 mmm I see, I was thinking in adding a parameter disabling it and using an env file overriding this parameter 14:05:43 by scenarios I mean: https://github.com/openstack/puppet-openstack-integration#description 14:06:04 ccamacho: I think the end plan here is to allow multiple environments, each which have a fragment of ControllerServices inside 14:06:05 This just sounds like something we need to add to the CI matrix. 14:06:10 so you can build up one big list of services 14:06:31 bnemec: Yup, I guess the question is how do we handle it, when new services are being proposed all the time 14:06:51 we could have a policy of at least a test ci run with the service enabled (like for manila) but it isn't automated obviously 14:06:52 maybe we can have some kind of living document, or CI generated matrix that shows where we cover each service, e.g in which jobs? 14:07:03 couldnt we just emulate puppet scenarios? 14:07:16 we wont be adding anything that is not already in a puppet scenario 14:07:23 marios: Yeah, but we found out the hard way with Sahara what happens when you merge something that's not always CI tested 14:07:51 shardy: right, like we do in Puppet CI (have you seen my link?) 14:08:13 shardy: ack, just saying, in the absence of automated testing/until then we make it a rule/add to reviewer guidelines doc (we have a spec somewhere right) 14:08:24 EmilienM: aha, yes, something like that would be ideal 14:08:50 shardy: EmilienM, why not use *exactly* that 14:08:51 marios: sure, I guess having passed CI once is better than having passed it never ;) 14:09:02 as in the exact matrix 14:09:08 it's a lot of jobs 14:09:20 we can start by iteration maybe and add a second scenario 14:09:27 they would all be multinode nodepool ones though ya? 14:09:31 yes 14:09:37 I am not sure about the scenarios, if we need more nodes just to cover more services, can't we just increase the overcloud ram and deploy all services in a single job? 14:09:39 we are not testing undercloud for those 14:10:03 we could use periodic jobs as well 14:10:27 gfidente: that's certainly an option, but we'd need a way to keep the list of "all" services up to date 14:10:34 gfidente: That's true, we would conceivably have the ability to deploy different environments. 14:10:37 mhh, periodic job is good but also uncertain people watch it 14:10:39 because the conversation started with not wanted everything always enabled 14:10:57 We're no longer locked into all the environments being identical (in fact, they aren't today). 14:11:21 what's the worker count on CI runs? 14:11:24 EmilienM: we need to surface the results better, e.g via a page on tripleo.org 14:11:42 shardy: Wouldn't the allinone config live in tht? I thought we had all pretty much agreed with that on the list. 14:11:45 we're surely not running load testing there so would it make sense to just limit all services to 1 worker in CI? 14:11:58 jokke_: We already do that. 14:12:20 bnemec: Yeah, it was more how we handle having aio_one_node.yaml, and controller_enable_everything.yaml etc 14:12:23 http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/test-environments/worker-config.yaml 14:12:34 maybe we can add some validation checks to tht to ensure they're updated 14:12:40 probably be easy enough 14:12:54 +1 14:13:27 bnemec: ach :( 14:13:37 Ok, so it sounds like the consensus is we need to start landing things disabled by default, and work on improving the test matrix/coverage 14:13:47 ccamacho: anything else to add on this before we move on? 14:13:51 ack thanks! 14:14:09 I'll follow up with Rabi re the heat feature to enable composable Services lists 14:14:15 (he's already posted a WIP patch) 14:14:40 #info need to start landing things disabled by default, and work on improving the test matrix/coverage 14:14:47 Ok so the next one is mine 14:15:02 I've started pushing up patches working towards custom-roles: 14:15:12 https://review.openstack.org/#/q/status:open++topic:bp/custom-roles 14:15:28 basically it's a big refactor moving role specific things out of overcloud.yaml 14:15:41 followed by switching to a jinja templated overcloud.yaml 14:16:00 I had been blocking on the mistral deployment stuff, so we can wire in jinja via a mistral action 14:16:13 but instead, decided to rebase the series so we can start landing the refactoring 14:16:29 any help with reviews would be great, it's going to be at least 20 or so patches I think 14:16:39 I'm trying to keep them fairly small 14:16:47 good approach 14:16:59 One question is what we do with the inconsistent names long term 14:17:45 e.g controllerImage and NovaImage 14:18:00 maybe we can provide an upgrade path 14:18:02 for now, I'm assuming we leave those alone, and just move them into the per-role templates 14:18:22 but later, we may want to deprecate them and move to more consistent naming, e.g ComputeImage and ControllerImage 14:18:32 EmilienM: Yeah, we don't have a good way to do that atm 14:18:49 with j2 templates we should be able to standardilize them and still pick the old ones if the new one is not present 14:18:50 i'd think we need to at least deprecate them before removing them 14:18:53 maybe another heat feature to enable deprecated names (similar to how oslo.config does it) with a warning 14:19:19 jokke_: The j2 templating happens before you get parameters from the user 14:19:19 shardy: Yes please. 14:19:55 Ok, so I think we'll probably run out of time for the heat feature this cycle, but I can look into it and chat to those who may have time to work on it 14:20:13 shardy: what's the point on that? So we generate out of templates with default values and then overwrite them with the user ones? 14:20:16 are people OK if we get the first cut of custom roles done maintaining the old parameter naming, for compatibility? 14:20:36 yes 14:20:46 shardy: Doesn't sound like we have much choice for now. 14:20:51 jokke_: So you can generate a template that will deploy a "ContrailController" cluster, or dedicated cluster for DB etc 14:21:05 We don't have the ability to deprecate, and we can't just remove the old names. 14:21:23 jokke_: https://review.openstack.org/#/c/315679/19 shows the approach 14:21:44 it's only dynamic in as much as it enables user defined roles beyond the 5 we support by default 14:21:56 bnemec: well, there are hacks we could do in the templates 14:22:05 shardy, just to be sure, you're only talking about the parameter name, or the nova instance name as well? 14:22:14 like adding the old and new parameters, then joining them 14:22:27 shardy: gotche 14:22:33 but we'd need to invent our own warnings method, such as having a "deprecated" parameter_group 14:22:53 I actually started adding such a group in a few places, but I'm not sure if it's the best approach 14:23:03 gfidente: Only the parameter names 14:23:34 Anyway, feedback welcome on the reviews - I'm hoping we can iterate fairly fast to make the n3 deadline 14:23:53 bandini: NG HA update? 14:23:57 sure 14:24:06 Just a short update on the HA Next Generation work. 14:24:15 Testing has been fairly good so far and QE is internally playing with it now since more than a week. 14:24:25 I discussed with gfidente a bit how we can make the landing of it a bit easier. We came up with a very non/low-invasive plan with small changes. 14:24:47 It will require aodh profile stuff to land (https://review.openstack.org/#/c/333556) and a couple of other reviews that are quite independent of the HA NG specifics. 14:24:59 Namely: 14:24:59 nova constraints moving to their respective profiles: https://review.openstack.org/347309 https://review.openstack.org/347310 14:25:02 fake openstack-core role: https://review.openstack.org/#/c/347315/ https://review.openstack.org/#/c/347314/ 14:25:05 Once those are landed the remaining work is minuscule (couple of lines, puppet settings). 14:25:21 That's all from my side. Are there any questions/concerns on this work in general? 14:25:47 Folks regarding adding the ovs-dpdk to tripleo, we have made good progress 14:25:51 bandini: when we switch to non-pacemaker for many services, is the plan to remove the pacemaker templates/profiles for those services? 14:26:00 I can also make a more detailed report on the ML if IRC is not the best medium ;) 14:26:02 when we're confident it all works obviously :) 14:26:16 bandini: ++ for ML 14:26:23 shardy: yeah. correct 14:26:30 EmilienM: ack, will do 14:26:32 bandini: A ML thread would be good, with a list of patches (or a link to an etherpad with those links) 14:26:35 thanks 14:26:45 EmilienM: do you have a very rough ETA for the aodh profile work? 14:26:58 gfidente: did you add the HCI ceph item? 14:26:59 just a ballpark figure 14:27:38 shardy, I did yes, wanted to give a justification to https://review.openstack.org/346897, which is using a dedicated ceph storage node as we used to do before 14:27:51 The synchronized steps also depends on landing aodh, and removing the last few things from puppet/manifests/* 14:28:14 yeah so the submissions to deploy ceph/jewel HCI are all up, the top-most is https://review.openstack.org/#/c/338088/ 14:28:44 gfidente: ack, I think we'll want to test HCI and a dedicated CephStorage node anyway tbh 14:28:46 yet it only works if we add a depends_on for the compute role on controller, which we clearly don't want 14:29:15 shardy, right, so until the synchronization is sorted and we can move to HCI, we're stuck with 'dedicated' node 14:29:51 gfidente: I think we can generate the per-step deployments in overcloud.yaml as part of the custom-roles work 14:30:14 but first we need the various per-role manifests to be empty I think, so we only use step_config 14:30:17 shardy, ack, will be helping there with reviews or as I can 14:30:17 https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/manifests/overcloud_compute.pp 14:30:36 is anyone looking at removing these last few bits from the manifests, EmilienM perhaps? 14:30:41 yes 14:30:47 but I still need help with the last nova bits 14:30:50 ::nova specially 14:30:53 https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/manifests/overcloud_controller.pp 14:31:10 https://review.openstack.org/325983 and https://review.openstack.org/328347 14:31:18 I have troubles to make it working 14:31:21 Allah is doing 14:31:29 I also took over Aodh 14:31:30 sun is not doing Allah is doing 14:31:31 EmilienM: Ok, thanks, we can get some reviews on there and hopefully get those working 14:31:41 moon is not doing Allah is doing 14:31:50 stars are not doing Allah is doing 14:31:58 planets are not doing Allah is doing 14:32:00 gfidente: Ok, anything else on this before we move on? 14:32:10 galaxies are not doing Allah is doing 14:32:17 shardy, nope, good for me unless there are questions 14:32:18 can we boot the bot? 14:32:19 oceans are not doing Allah is doing 14:32:25 I asked on infra channel 14:32:27 mountains are not doing Allah is doing 14:32:31 EmilienM++ 14:32:38 trees are not doing allah is doing 14:32:41 bots are doing! 14:32:47 mom is not doing Allah is doing 14:32:53 dad is not doing Allah is doing 14:32:59 boss is not doing Allah is doing 14:33:06 skramaja: You have an update re SR-IOV and DPDK? 14:33:07 job is not doing Allah is doing 14:33:10 related to SRIOV & DPDK tripleo specs, we have been working on ironic spec for setting the kernel args, but it has got a heavy resistence not to set kernel args via ironic.. instead suggested to use user_data.. 14:33:16 dollar is not doing Allah is doing 14:33:24 degree is not doing Allah is doing 14:33:31 medicine is not doing Allah is doing 14:33:32 we wanted to know if there any suggestions or know issues around it.. 14:33:38 customers are not doing Allah is doing 14:33:39 pabelanger: ban Guest_48743 please 14:33:44 because we need to reboot after setttgint the kernels 14:33:54 $var is not doing 14:33:58 *after settign the kernel args.. 14:34:03 trown: lolz 14:34:15 /ignore ftw 14:34:17 would like to know if any suggestions around it.. 14:34:29 ironc spec - https://review.openstack.org/#/c/331564/ 14:35:00 Hmm 14:35:19 skramaja: you can pass user_data via TripleO, but won't that mean you need to reboot? 14:35:26 * jroll sees kernel args stuff and listens in 14:35:35 yes.. reboot is needed.. 14:35:41 thats the concern.. 14:36:12 kernel args need to be set before os-net-config.. 14:36:28 as dpdk driver binding needs those args mandatorily.. 14:37:43 you can not get a job without the permission of allah 14:37:54 skramaja: Ok, well a solution probably is to pass something to cloud init via a firstboot script 14:37:59 you can not get married without the permission of allah 14:37:59 http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/extra_config.html#firstboot-extra-configuration 14:38:01 Allah is giving some pain here.. 14:38:13 however we'll have to be sure puppet doesn't then go and reconfigure them later: 14:38:18 shardy: yes.. 14:38:19 nobody can get angry at you without the permission of allah 14:38:20 https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/kernel.yaml 14:39:00 shardy: right, config should only be done by puppet 14:39:20 EmilienM: but we can't do that here, because puppet won't run before os-net-config 14:39:29 ok 14:39:31 shardy: do u mean this will affect puppet config? 14:39:35 light is not doing Allah is doing 14:39:42 so we need to sync what is done before and what is done in the tht service 14:39:42 fan is not doing Allah is doing 14:39:43 skramaja: can you start a ML thread, and we can follow up with details there? 14:39:56 ok.. sure.. 14:40:05 skramaja: I mean if you set some config value via cloud-init then reboot, you don't want puppet to later set it back again 14:40:10 businessess are not doing Allah is doing 14:40:19 america is not doing Allah is doing 14:40:21 ok.. 14:40:24 skramaja: thanks! 14:40:29 shardy, can we get acceptance on DPDK Spec https://review.openstack.org/#/c/313871 14:40:54 fire can not burn without the permission of allah 14:41:08 knife can not cut without the permission of allah 14:41:15 beagles: you previously reviewed that, can you circle back and vote on it? 14:41:16 rulers are not doing Allah is doing 14:41:24 shardy: yup 14:41:26 governments are not doing Allah is doing 14:41:31 karthiks: will revisit it later today, thanks for the reminder 14:41:33 sleep is not doing Allah is doing 14:41:42 hunger is not doing Allah is doing 14:41:57 Ok, time to move on as we're running short on time :) 14:41:58 Thank shardy . 14:42:01 pabelanger is not doing 14:42:01 #topic bugs 14:42:23 #link https://bugs.launchpad.net/tripleo/ 14:42:39 #link https://bugs.launchpad.net/tripleo/+milestone/newton-3 14:42:39 food does not take away the hunger allah takes away the hunger 14:42:59 water does not take away the thirst Allah takes away the thirst 14:43:06 seeing is not doing Allah is doing 14:43:13 We have 55 bugs targetted at n-3, so help burning down those via prioritizing reviews would be good 14:43:13 hearing is not doing Allah is doing 14:43:20 seasons are not doing Allah is doing 14:43:29 also please ensure anything you would consider a release blocker gets targetted to n3 14:43:43 Any other bug related things to discuss? 14:44:33 #topic Projects releases or stable backports 14:44:49 So, last week we were waiting on stable CI promotion to cut a stable release 14:45:20 does anyone have a link to the recent results, has mitaka periodic job passed? 14:45:33 we really need those results on tripleo.org ... 14:45:43 * shardy had a link but is failing to find it 14:45:46 shardy: there's a patch to add it back... 14:45:54 shardy: everything promoted last night i believe 14:45:56 including master 14:46:05 slagle: aha, nice, OK will check that patch then, thanks 14:46:20 https://review.openstack.org/308865 14:46:24 slagle: great, Ok I'll look at tagging a stable release this week then based on the most recent good promote 14:46:24 there's a script you can use anyway 14:46:36 slagle: ack, thanks 14:46:44 shardy: https://dashboards.rdoproject.org/rdo-dev that? 14:46:45 shardy: there was a hiccup in the mirror-server, it ran out of space, but derekh fixed it 14:46:59 so if the promotes didn't happen for liberty/mitaka, we can redo them 14:47:12 master has been promoted however 14:47:21 EmilienM: Yeah, I've been checking that, but it doesn't show the tripleo stable periodic jobs? 14:47:28 shardy: it doesn't, right 14:47:33 ya, I've freed up space and will push a patch later to have it down daily in future 14:47:38 it's good to check current-tripleo tho 14:47:43 shardy: I can work on it for tripleo.org if you want 14:47:54 oh wait, liberty/mitaka doesnt promote does it? 14:48:08 anyway, the latest passed last night 14:48:14 slagle: No, we just need a green run to base the release on 14:48:14 slagle: correct, we only have promote on master 14:48:19 looks like periodic mitaka is passing 14:48:20 #topic CI 14:48:31 (since we're now talking about CI anyway ;) 14:48:50 weshay: is there anything you need from us re the RDO promotion @ 20d ? 14:49:04 we looked into the issue on Friday, which turned out to be quickstart, right? 14:49:23 we should audit to ensure there's nowhere else it's modifying config files managed by puppet... 14:49:23 good question.. we've been very close the last few days.. we keep hitting new valid isssues 14:49:38 https://bugs.launchpad.net/tripleo/+bug/1605876 will need to be fixed 14:49:38 Launchpad bug 1605876 in tripleo "openstack baremetal import --json instackenv.json fails with "PluginAttributeError: _auth_params"" [Critical,In progress] - Assigned to Brad P. Crochet (brad-9) 14:49:51 since osc removed the private attrs that mistralclient was trying to use 14:49:58 weshay: so that will probably block you (again) next 14:50:06 shardy: the problem Friday was PEBKAC it was not quickstart 14:50:30 trown: Ok, the bug said quickstart was modifying ironic.conf on the undercloud, was that correct? 14:50:42 slagle, I think apevec may have downgraded that package or put in a some kind of a temp workaround 14:50:44 * weshay checks 14:50:45 shardy: the problem just before that was a bad instack-undercloud patch... so pointing fingers at configuring a specific to quickstart thing outside of puppet is not helpful 14:51:03 in any case RDO promote will pass today 14:51:07 trown: I'm just restating what was in the bug in an attempt to understand the root cause 14:51:26 nobody is pointing fingers here, and FWIW we all burnt a ton of time helping/investigating 14:51:44 yep 14:51:45 bugs have root causes, that's not pointing fingers 14:51:56 ya there are 25 issues in the etherpad 14:52:03 I have burnt plenty of time 14:52:34 weshay: yes he did, but that's not going to work given the recent patch to osc 14:52:41 trown: Ok, lets follow up after the meeting, I just want to understand what broke, why, and how we make sure it won't happen again 14:52:52 (from tech pov, configuring things outside puppet in tripleo is a bad idea) 14:52:59 weshay: we'll need a proper fix 14:53:22 thrash: have you been able to look at https://bugs.launchpad.net/tripleo/+bug/1605876 at all? 14:53:22 Launchpad bug 1605876 in tripleo "openstack baremetal import --json instackenv.json fails with "PluginAttributeError: _auth_params"" [Critical,In progress] - Assigned to Brad P. Crochet (brad-9) 14:53:24 derekh: Do you have an update re rh1 before we run out of time? 14:53:35 RE: rh1 - I had a few lines ready to describe progress/problems but time is short so 14:53:37 the main question I think we need to answer is nonha or ha overcloud? 14:53:37 I'm inclined to go for a nonha myself, especially if we have the scripts to the point where we can just redeploy at will 14:53:37 mainly because it will be easier to debug and we don't need to aim for 5 9's in ci 14:53:44 slagle: yeah.. I've actually posted a proposed fix to mistralclient 14:53:49 EmilienM: it is 100% specific to quickstart, we have to tell ironic to use unpriviledged libvirt... nobody would use it outside of quickstart 14:53:51 +1 to nonha 14:53:54 thrash: sweet, thx 14:53:57 slagle: for some reason it's not registering on the bug... 14:54:04 EmilienM: give me a generic hiera interface on the undercloud and I will do it with puppet 14:54:12 slagle: https://review.openstack.org/#/c/347354/ 14:54:15 trown: we have that 14:54:24 trown: see hieradata_override in undercloud.conf 14:54:28 trown: sure, can I see the code that configure it? so I can look what puppet class it is 14:54:36 shardy: we'd hope to have the cloud readfy befor the end of the week (we'll it better be I'm on PTO next week) 14:54:45 trown: you can set any hieradata 14:54:47 \o/ 14:55:27 derekh: +1 on nonha, and great to hear it's nearly ready 14:55:49 slagle: hmm did not know about that thanks... but we have to support liberty and mitaka too 14:55:51 trown: Yeah, if possible it'd be better to create a quickstart specific hiera override file, that's then configured in undercloud.conf 14:56:03 shardy: ack, I'll post the summary in #tipleo after the meeting 14:56:10 derekh: nice, thanks! 14:56:12 trown: please give me the link of the patch that configure it and I'll see how to do it with puppet/hiera in instack 14:56:58 Ok, we already discussed some specs, so I'll skip to open discussion 14:57:03 #topic open discussion 14:57:06 3 mins 14:57:08 we need some help in reviewing our patches for SR-IOV and DPDK automation 14:57:16 anyone have anything remaining to discuss (quickly ;) 14:57:28 I'm the PTL for manila and I'm very interested in seeing this merge: https://review.openstack.org/#/c/188137/ 14:57:45 trown: right, ok. it's possible you cant set this anyway via hiera 14:57:45 bswartz: we are close :) 14:57:50 Just a quick note: it looks like the mistral node registration and introspection is slower than what we had before. We added a few minutes per CI run when that merged. 14:57:54 bswartz: Hi! Yup, I think it's very close now 14:57:59 sorry that it's taken so long 14:58:06 Also I need to rebase this (very old) patch and I'm looking for help on how to do that: https://review.openstack.org/#/c/188138/ 14:58:30 shardy: we are new to composable roles and we are adding lot of THT for SRIOV and DPDK 14:58:34 bnemec: there's probably some stuff discussed in the bug from friday we can fix tht will help 14:58:45 bswartz: i could help out once we land manila generic 14:58:46 like, setting nodes to states then setting them again etc 14:58:49 it would be great to get earlier feedback.. since we are targetting for n3 14:58:54 thanks marios 14:59:03 shardy: Ah, that would be good. Trying to figure out why so many jobs are taking >2 hours now. 14:59:12 That doesn't explain the whole increase, but it's a fair chunk of it. 14:59:32 bswartz: +1 if you can work with marios that would be great 14:59:33 skramaja: yeah, we are reviewing it 14:59:38 will need some refactoring I think 14:59:43 thanks EmilienM.. 14:59:43 Thanks EmilienM 14:59:48 bnemec: the workbooks also take a while to load i've noticed 15:00:02 slagle: At the end of the undercloud install? 15:00:03 Ok, we're out of time -> #tripleo 15:00:04 we lost a minute or so there 15:00:05 bnemec: yes 15:00:06 thanks everyone! 15:00:12 #endmeeting