14:00:47 #startmeeting tripleo 14:00:47 #topic agenda 14:00:47 * Review past action items 14:00:47 * One off agenda items 14:00:47 * Squad status 14:00:47 * Bugs & Blueprints 14:00:47 Meeting started Tue Oct 10 14:00:47 2017 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:48 * Projects releases or stable backports 14:00:48 * Specs 14:00:48 * open discussion 14:00:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:49 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:00:50 Hi everyone! who is around today? 14:00:51 folks, where i can see pacemaker detailed logs, for example reason why node not become as master etc 14:00:51 The meeting name has been set to 'tripleo' 14:00:53 o/ 14:00:54 hi 14:00:55 o/ 14:01:00 Hola! 14:01:01 o/ 14:01:05 o/ 14:01:07 o/ 14:01:08 0/ 14:01:08 o/ 14:01:09 Hey 14:01:11 o/ 14:01:17 hi 14:01:21 o/ 14:01:22 o/ 14:01:25 o/ 14:01:32 o/ 14:01:56 o/ 14:02:32 o/ 14:02:47 hey everyone, let's go 14:02:51 #topic review past action items 14:02:52 mwhahaha to follow up with shardy about services and ovb jobs 14:03:10 so I think shardy piped in last week after i made this item, not sure we have much to do on this at the moment 14:03:36 i think there's also efforts to move ovb to rdo cloud so it's probably not the best time to make changes. so we can circle back around to this later in the cycle 14:04:05 the next item was ceph-upgrade ci 14:04:21 fultonj, gfidente: any update on ceph-upgrade ci? 14:04:27 that's in the integration squad pad 14:04:35 k we'll get to that later 14:04:48 ack 14:04:50 mwhahaha to move bugs <= medium to queens-2 and review > medium for validity - DONE 14:04:57 that's in ont he past action items from last week 14:05:15 o/ sort of ci squad is also having retro 14:05:32 #topic one off agenda items 14:05:32 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:05:33 trown: now? 14:05:37 EmilienM: ya 14:05:43 trown: ouch, bad timeing :/ 14:05:43 (mwhahaha) Newton EOL & CI review 14:05:57 fyi, i've asked for a stay of EOL for tripleo projects 14:05:58 http://lists.openstack.org/pipermail/openstack-dev/2017-October/123221.html 14:06:05 but it's been asked to review with the stable team what that means 14:06:16 have we got feedback on that? 14:06:16 so i'm looking for some volunteers to help with that effort 14:06:26 yea tonyb want's to chat about the impact 14:06:34 mwhahaha: count me in 14:06:43 he said it was probably ok but wants to understand what that means 14:06:57 well, see if we can keep branches and stable jobs 14:07:04 #action EmilienM to sync with tonyb about stable/newton retirement CI 14:07:07 we're still doing backports now 14:07:23 shardy: tagged you in a doc 14:07:38 so please also take some time to review the email thread around that and chime in if you have input 14:07:41 we can expect things going down in future for newton but for now we still have backports 14:07:49 ok I will 14:07:55 moving on 14:07:56 (jaosoroir) Blueprint: Add option for containers to log to stdout 14:07:58 * mwhahaha pokes jaosorior 14:08:10 hey! 14:08:21 So, just giving a heads up to people about that blueprint 14:08:32 Here's the link to the spec https://review.openstack.org/#/c/510001/ 14:09:06 But in a nutshell, the plan is to get all of the containers to log to stdout (as one would usually expect with containers) 14:09:23 and let the docker logging driver handle the logs 14:09:39 getting some extra metadata with that 14:09:57 jaosorior: so would you not log to /var/log/containers on the container host? 14:10:00 jaosorior: can we do that and still log to file on the host? 14:10:08 larsks: ^^ 14:10:21 shardy: sure, we can double-log if somebody wants that. 14:10:36 But the two big advantages here are: 14:10:46 fultonj, shardy: though, the plan right now is to make it pluggable. So we would still log to /var/log/containers by default 14:10:50 (a) lots of useful metadata, including k8s metadata about the pod in which a container is running, and 14:11:04 (b) we support most services automatically, rather than requiring some sort of per-service log configuration. 14:11:35 sounds like some good coversation for the spec review :D 14:11:41 with k8s logging to /var/log local bindmount essentially becomes an antipattern 14:11:52 jistr: agreed! 14:11:56 larsks: ack, and this will just work with the existing fluentd integration? 14:11:58 #action team to review logging to std spec https://review.openstack.org/#/c/510001/ 14:12:06 (but until then it's fine/useful imo :) ) 14:12:24 jaosorior fwiw, this is what the ceph containers currently do 14:12:27 shardy: I think it would be an "instead of" rather than "works with". 14:12:30 https://bugs.launchpad.net/tripleo/+bug/1721841 14:12:32 Launchpad bug 1721841 in tripleo "ceph docker containers do not log to /var/log/containers" [High,Triaged] - Assigned to John Fulton (jfulton-org) 14:12:46 shardy: unfortunately it won't "just work", either we rework the fluentd configuration to take input for journald, or we introduce another forwarder (was planning to use rsyslog) 14:12:48 larsks: ack 14:12:52 jaosorior jistr they log to stdout and rely on the dockerd journal socket 14:13:05 gfidente: actually, we took the approach based on input from folks working on Common Logging, and Ceph :D 14:13:06 (d was in the wrong place) 14:13:07 jaosorior: Ok, well we'll need to figure out the operator impact of that 14:13:26 I think the local file logging will be needed at least for this pre-k8s iteration of TripleO + containers 14:13:27 shardy: well, if they deploy the default, everything will stay the same 14:13:40 diff directions but coming together (whereever it is) is good 14:13:47 shardy: one would need to manually choose to deploy to stdout (with an env file) 14:13:59 jaosorior: ++ sounds good then 14:14:42 Numan Siddique proposed openstack/tripleo-heat-templates master: ovn: Remove setting of DockerNeutronApiImage param https://review.openstack.org/510900 14:14:59 jaosorior thanks, any other things you want to bring up around this? 14:15:24 so, folks, please check out the spec, and the reviews https://review.openstack.org/#/q/(topic:+bp/logging-stdout-rsyslog+OR+topic:tripleo-logging)+status:open 14:15:39 that was all :D thanks everyone 14:15:43 cool thanks 14:15:44 (d0ugal/fultonj) Improve user experience for when ansible fails 14:15:48 * mwhahaha pokes fultonj and d0ugal 14:15:54 ack 14:16:10 there are two examples in the pad showing what the tripleo ansible errors look like 14:16:12 ack also 14:16:39 e.g. http://paste.openstack.org/show/623232 14:16:56 .e.g users may not know to ssh in and find the playbook and run it 14:17:08 same for the heat > mistral > ansible cycle example 14:17:10 we can document 14:17:12 but.... 14:17:30 if we can drop them into an ansible shell with an inventory so they can do ad hoc commands, that would be nice IMO 14:17:44 do we solve w/ http://lists.openstack.org/pipermail/openstack-dev/2017-September/122234.html 14:17:56 yeah or at least print a message showing how to re-run the step via config download & ansible-playbook 14:18:03 i think situation with cryptic failures in Ansible will get better w/ config-download 14:18:04 yea +1 14:18:18 there's also just the terrible error messages we generate 14:18:33 or even lack there of sometimes 14:18:34 Can ansible output anything other than it's own formatted text? 14:18:39 Oliver Walsh proposed openstack/puppet-tripleo stable/newton: Handle duplicate/invalid entries in migration SSH inbound addresses https://review.openstack.org/510799 14:18:46 or, in other words, can we get json or something we can make sense of? 14:18:50 d0ugal: well, you can write out data in whatever format you want... 14:19:00 d0ugal: I believe that shadower did some formating for ansible output for tripleo-validations 14:19:06 Or have a plugin collect execution info and dump that for you. 14:19:15 jtomasek: ah, I thought that looked much neater, I'll need to take a look 14:19:16 Oliver Walsh proposed openstack/puppet-tripleo stable/newton: Handle duplicate/invalid entries in migration SSH inbound addresses https://review.openstack.org/510799 14:19:21 larsks: right, that would be useful. 14:19:30 sdoran do you know if ansible can output better error messages or there's a way we can improve that? 14:19:40 even without nicer ansible output the user needs to get to it 14:19:42 oh logging in Pike, how i loath thee 14:19:52 That's a fun topic. 14:19:56 at the moment they have the examples in the pad 14:19:58 d0ugal: there may be a json formatter already in ansible that you should just enable (but I've forgotten and never really tried it and may in fact be hallucinating) 14:20:04 it would probably help if we output the stack failures list with --long by default when a deploy fails 14:20:08 Do you mean output to standard out? 14:20:23 shadower: :) I did a quick google before, I'll look around again 14:20:24 Usually -vvvv gives you the best debugging output. 14:20:28 sdoran: i think currently we only output the stderr 14:20:33 I did have some cool dashboard with kibana built with the rsyslog -> elastic work we have in browbeat... 14:20:41 But the preferred method today is to enable logging. 14:20:48 d0ugal: if there isn't, this is what tripleo-validations does: https://github.com/openstack/tripleo-validations/blob/master/validations/callback_plugins/validation_output.py 14:20:50 The log file gets a lot of good info if you enable it. 14:21:04 d0ugal: as larsks said, you can make it output anything you want 14:21:15 d0ugal, shardy: so can we enable the logging and capture that as part of the execution workflow? 14:21:45 d0ugal: I actually planned to do a json output and have the tripleo UI consume that, but got called away before I got the chance 14:21:45 mwhahaha: yes we could, or perhaps modify the heat-config-ansible hook to capture and send back better data 14:21:51 allowing for additional debugging with some id? ie go download the log 14:22:01 mwhahaha: this will obviously get easier if/when we move to driving ansible via mistral outside of heat 14:22:31 We need to make sure we can display -vvvv to the user in a sensible way, since the output could be huge. 14:22:42 "_send back_" is the important part IMO 14:22:45 ok so it sounds like we have some possibilities but probably need to PoC and investigate our logging options for the ansible execution 14:22:54 yup, sounds good 14:23:01 who can spend some time investigating? 14:23:09 * mwhahaha prepares the action item 14:23:18 fultonj: yeah, well we do have a transport for that already, it's just perhaps not capturing/filtering all the right data 14:23:28 mwhahaha: I'll take a look 14:23:28 I would say the best way to do logging for later analysis would be to enable logging and ship it to an analysis tool like an elastic stack. 14:23:30 mwhahaha shardy I think execution could be going away in queens 14:23:35 jistr ^ 14:23:39 You could have it just forward the log. 14:23:42 #action d0ugal investigate ansible logging options around execution to hopefully improve debugging/UX 14:23:46 shardy: in case we plan to let user re-run certain step manually, it would be very good to provide that functionality via API too 14:23:57 fultonj Ben England recommends using ANSIBLE_STDOUT_CALLBACK=debug for ansible logging additional to -vvv 14:24:09 jtomasek: ack, yes it should be possible via a similar workflow to that being implemented for minor updates 14:24:14 gfidente: that's fine but we could also have a wrapper around the raw playbook execution that includes log capture 14:24:16 e.g config download then run ansible via mistral 14:24:18 You could also use a callback plugin to capture job data, but I think you'd get more bang for your buck by just grabbing the log file. 14:24:21 export ANSIBLE_LOG_PATH=~/ansible.log 14:24:25 export ANSIBLE_DEBUG=True 14:24:30 d0ugal: mwhahaha shardy i was distract by the RH openstack meeting 14:24:32 rook: you can -v all you want today 14:24:38 call back has a wip 14:24:41 sure fultonj 14:24:52 shardy: sounds good 14:24:53 d0ugal: mwhahaha shardy but in the upgrade dfg we are currently facing the ansible output issue 14:24:53 rook: i mean it's in the THT already if you want it 14:24:55 gfidente, shardy: right re execution going away possibly. We want to rewrite service_workflow_tasks into ansible essentially 14:25:05 fultonj the additional env var. 14:25:07 (following the earlier disussion on ML) 14:25:10 so we are thinking on improving it quickly 14:25:10 gfidente: yeah this gets easier if we do simplify the layers driving ansible 14:25:20 mwhahaha shardy so it's in the mistral ansible action that we'd need to parse output, not from execution anymore 14:25:23 rook: i'll show you after 14:25:37 shardy: FYI WIP is here https://review.openstack.org/#/c/510122 https://review.openstack.org/#/c/510781/ 14:26:27 shardy: already PoC tested with bogus tasks, now we need to try and rewrite the useful tasks for k8s, ceph-ansible etc. 14:26:28 matbu: facing how so? figuring out failures or do you need to parse output 14:26:42 mwhahaha: facing failure 14:26:44 with slagle's work and ansible driving more do we get this there? but pike has the issue? 14:26:46 jistr: ack thanks will check those out 14:26:53 jistr: i saw the tripleo-common one this morning and meant to ask you if we already have those defined somewhere (the undercloud_deploy_tasks) 14:26:56 matbu: ok d0ugal is going to investigate some options so maybe you can work with him 14:27:11 mwhahaha: yep we have some ideas too 14:27:14 jistr: but i see your comment above not yet 14:27:15 so sound good 14:27:26 ok yea work with him and provide some ideas 14:27:32 marios: yea not yet, all is to come. I hope to play with Kubespray first, and we could hopefully use some patterns out of there for the rest. 14:27:49 ok any other important comments on the ansible logging or can we move on? 14:27:53 jistr: thanks 14:28:04 fultonj: yeah there's probably a short term and (different) long term answer but there's probably a common piece around filtering the ansible output to surface the error 14:28:12 fultonj, d0ugal thanks for bringing this up. sounds like we have a bunch of folks running into the same problems 14:28:29 Merged openstack/tripleo-heat-templates stable/pike: Add IronicPxe to the default controller https://review.openstack.org/507981 14:28:53 rook: https://github.com/openstack/tripleo-common/blob/master/workbooks/ceph-ansible.yaml#L10 14:29:06 ok moving on to squad status 14:29:07 #topic Squad status 14:29:16 ci 14:29:16 #link https://etherpad.openstack.org/p/tripleo-ci-squad-scrum 14:29:28 upgrade 14:29:28 #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:29:40 containers 14:29:40 #link https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:29:52 dprince, mandre -^ missing today's status 14:30:13 Marios Andreou proposed openstack/tripleo-heat-templates master: EARLY WIP: Convert tags to when statements for Q major upgrade workflow https://review.openstack.org/510902 14:30:14 integration 14:30:15 #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:30:26 #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status 14:30:44 jtomasek: thanks! i'll add it in for next week 14:31:09 @mwhahaha: thanks 14:31:15 validations is still missing a status etherpad 14:31:24 networking is also missing a status etherpad 14:31:48 mwhahaha, I sent an email out this morning "[tripleo][networking] Organizing the networking squad " 14:31:58 * mwhahaha hasn't gotten to his email 14:32:01 sounds good 14:32:03 i'll keep an eye on it 14:32:13 mwhahaha, basically the networking squad "wasn't" but with everything that's been going on, I think it's time to get it together 14:32:29 (or past time really ... ) 14:32:29 #link http://lists.openstack.org/pipermail/openstack-dev/2017-October/123356.html - networking squad 14:32:42 beagles: yea sounds good, thanks for starting the effort 14:32:50 beagles now how can the networking squad not be together 14:32:56 am I the only one seeing the irony? 14:33:00 :) 14:33:07 nice summaries, I found it very useful to have async status 14:33:14 d0ugal/thrash: do you guys have a status update for workflows? 14:33:36 * gfidente yes you were 14:33:42 mwhahaha: making good progress on bp's to support ui. 14:34:03 thrash: thanks, let us know if you need anything specific. Also make sure to keep the bps up to date 14:34:07 mwhahaha: should have most with patches at least posted by M1 14:34:11 * mwhahaha will be mentioning bps next 14:34:39 there's also a python3 squd listed but i'm not sure anyone is actually doing any python3 specific things 14:34:51 s/squd/squad 14:35:20 so that's all on the status, thanks everyone for providing them 14:35:35 no we can probably remove that one 14:35:55 ok 14:36:05 #action mwhahaha to remove python3 squad from the list 14:36:12 #topic bugs & blueprints 14:36:12 #link https://launchpad.net/tripleo/+milestone/queens-1 14:36:12 For Queens we currently have 69 (+7) blueprints and about 444 (-32) open bugs. 193 queens-1, 254 queens-2 and 3 queens-3. Please take some time to review your blueprint status and make sure it is properly up to date. queens-1 is next week! 14:36:55 So yea queens-1 next week. please update bugs/blueprints if they won't merge 14:37:09 mwhahaha: for this one https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow i repurposed it this morning after chat with matbu, so we can track upgrades for Q... i'll retarget to Q3 14:37:22 marios: ok thanks i was going to ask about that one 14:37:28 q3 14:37:32 thanks 14:37:43 I thought the workflow would be for m2 14:37:49 so m3 we spend time on stabilization 14:37:51 marios: yea m3 might be a bit late, anyway to get that in m2? 14:37:58 mwhahaha: ack 14:38:19 marios: do you think it's realistic to have it for m2? like the basic workflow? 14:38:45 EmilienM: we can try it *might* be possible... the workflow in theory isn't radically different to what we have in some parts already 14:38:52 (ansible etc) 14:38:55 yeah 14:38:57 EmilienM: but T&C apply :) 14:39:28 EmilienM: but today we are still landing things for P so... 14:39:38 indeed 14:39:42 yea we have to catch up or it'll keep slipping 14:40:39 marios: if you have things that others can assist with it would be good to publicise some of the plans 14:40:55 maybe we can get additional assitance to land either pike or queens efforts 14:41:08 mwhahaha: ack we have a spec already (for example) linked in the upgrades squad status for interested folks 14:41:12 mwhahaha: as starter at least 14:41:16 marios: is there work on the CI space we need to prioritize to help you in the testing? or all is already under control? 14:41:25 EmilienM: marios (in a mtg) but Yes its realistic 14:41:47 EmilienM: it is on our radar that we have a basic keystone only job in place for m2 14:42:09 EmilienM: and yeah we will reach out to ci squad 14:42:23 sounds like there's some movement on it, so let's revisit it in ~2 weeks or so 14:42:47 marios: cool 14:43:18 #action marios and matbuto provide an upgrade workflow status in 2 weeks (reminder) 14:43:30 so the other blueprint i wanted to ask about was https://blueprints.launchpad.net/tripleo/+spec/ovs-container-support 14:43:37 EmilienM: who can I talk to about the oooqs experimental gate job? I would like to know where tempest runs from and how that external vxlan tunnel is being used 14:43:48 dprince, Slower, beagles: do we have any additional info on the ovs containerization? 14:44:29 trozet: we can talk about that after the meeting if you don't mind 14:44:30 marios: EmilienM actually we have an *almost* noop workflow in reviews 14:44:42 mwhahaha, I don't have any at the moment - its the next thing I'm working on after finishing off some octavia related stuff 14:44:53 EmilienM: ok thanks 14:44:56 mwhahaha, others might have some updates though... 14:45:00 beagles: ok so we need to get that figured out asap 14:45:05 matbu: cool add it to the squad status etherpad? 14:45:12 mwhahaha, yep 14:45:46 i'll keep bringing it up as a reminder 14:45:50 moving on 14:45:51 #topic projects releases or stable backports 14:46:02 marios: yep i'll 14:46:04 EmilienM: did the newton/ocata/pike releases go out? 14:46:07 this week 14:46:10 k 14:46:12 i'm still on it 14:46:15 probably today 14:46:22 any stable backports folks want attention on? 14:46:37 i've also asked to switch the scenario upgrade jobs to non-voting for ocata 14:46:44 to free up stable/ocata reviews 14:46:55 well, this one is merging in stable/newton https://review.openstack.org/#/c/510738/ hope the release includes it :D 14:47:01 #link https://review.openstack.org/#/q/status:open+topic:tripleo-upgrade-jobs-nv 14:47:49 jaosorior: we release every 2 weeks btw 14:48:03 or at least will try to :D 14:48:09 gates willing 14:48:15 hello all. does anyone have a multi-region config with OOO? 14:48:17 thanks 14:48:36 moving on to specs 14:48:41 #topic specs 14:48:41 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:48:49 reminder queens-1 next week, so we should be merging specs 14:48:56 please take some time to review the open specs 14:49:16 anyone have a spec (other than the already mentioned logging one) that they'd like to bring attention to? 14:49:17 what happens in queens-2? do we postpone specs to rocky? 14:49:28 yes i think so 14:49:42 we might want make this information more visible then 14:49:42 we've never managed to do that before due to lack of velocity in the specs repo 14:50:03 but worth a try I guess, particularly given the stabilization theme this cycle 14:50:30 given that we had many CI issues during queens-1 and have lost some time, we could defer the deadline to November 3 14:50:31 I'll send a note to the ML about it 14:50:33 and we can discuss it 14:50:50 so we have 3 weeks 14:51:00 after that, no more specs approved for queens 14:51:04 how does it sound? 14:51:18 EmilienM +1 14:51:28 well i wasn't going to move the specs until the end of queens-2 14:51:33 to at least give some time for review 14:51:39 but if you want to do it earlier that works for me 14:52:12 I think we wanted us to be a bit more conservative for queens 14:52:27 freezing queens specs earlier is imho part of this effort 14:52:30 k let's look at the schedule and i'll send an ML note 14:52:47 during Pike, folks sent Pike specs after m3 :D 14:52:51 https://releases.openstack.org/queens/schedule.html 14:52:52 we might want to fix that at least 14:53:40 so i think queens-2 is a decent spot to freeze the specs 14:54:16 it's not like we've been super hard on requiring specs as well 14:54:27 if we respect it, it's already a good progress comparing to newton & pike cycles 14:54:36 k i'll send an ML note 14:54:56 #action mwhahaha to send a note about spec reviews and spec freeze at queens-2 14:55:17 busy meeting today, so let's allow for some open discussion 14:55:18 #topic open discussion 14:55:24 anyone have anything else they want to talk about? 14:56:57 i'll take that as a no 14:57:02 thanks everyone 14:57:03 #endmeeting