14:00:23 #startmeeting tripleo 14:00:28 Meeting started Tue Nov 8 14:00:23 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:31 o/ 14:00:32 The meeting name has been set to 'tripleo' 14:00:36 hey all, who's around? 14:00:40 hello 14:00:44 o/ 14:00:44 hey 14:00:56 Please add any one-off items to https://etherpad.openstack.org/p/tripleo-meeting-items 14:00:59 o/ 14:01:04 Hi! 14:01:06 o/ 14:01:15 \o 14:01:16 o/ 14:01:29 o/ 14:01:34 o/ 14:01:50 o/ 14:01:54 #topic agenda 14:01:54 * one off agenda items 14:01:54 * bugs 14:01:54 * Projects releases or stable backports 14:01:54 * CI 14:01:55 o/ 14:01:57 * Specs 14:01:59 * open discussion 14:02:03 hey!!! TripleOers! 14:02:26 o/ 14:02:40 Ok then, hi all - lets get started! 14:02:45 o/ 14:02:52 o/ 14:03:00 I don't see any one-off items except the one I added re Ocata-1, which we can cover in the project releases standing item 14:03:16 does anyone have anything to add before we get into the recurring items? 14:04:04 Alright, could be a short meeting today then :) 14:04:16 o/ 14:04:26 w00t 14:04:30 #info skipping one-off items as there aren't any 14:04:35 #topic bugs 14:04:52 #link https://bugs.launchpad.net/tripleo/ 14:05:13 So, it's been another bad week for CI impacting bugs, thanks to everyone for efforts resolving them 14:05:35 https://bugs.launchpad.net/tripleo/+bug/1638908 is still unresolved AFAIK 14:05:35 Launchpad bug 1638908 in tripleo-quickstart "Overcloud deployment fails in minimal configuration with ('Connection aborted.', BadStatusLine("''",))" [Undecided,In progress] - Assigned to Alfredo Moralejo (amoralej) 14:05:41 has anyone had any luck locally reproducing that? 14:05:49 I hit it once, but was then unable to reproduce 14:06:03 not me, plan to retry tonight though 14:06:19 there's a theory that increasing haproxy timeouts will help, but I'm not yet clear if that's the full story 14:06:19 I have never been able to reproduce that one outside of CI 14:07:31 long time ago, in one of my tests I've seen this, and was because haproxy wasn't sending the proper headers 14:07:31 ya it only happens with ssl, so haproxy timeouts could help 14:07:54 and as trown said, on ssl only 14:08:08 I hit it without ssl post'ing to swift locally, but that could have been a different issue which just caused the same low-level python cryptic error 14:08:20 there is a bigger issue in that bug though (not CI impacting, but user impacting) in that the logging is pretty awful 14:09:01 yeah there's no swift logs at all, even with undercloud_debug = true 14:09:09 so we can probably fix that at least 14:09:27 Ok then, lets move on, but if anyone has any more clues please do update the bug, thanks! 14:10:34 https://bugs.launchpad.net/tripleo/+bug/1604927 is another critical issue we don't seem to have a handle on yet 14:10:34 Launchpad bug 1604927 in tripleo "Deployment times out due to nova-conductor never starting" [Critical,Triaged] 14:10:43 bnemec: any further clues on that one? 14:11:29 shardy: I haven't actually seen that recently. We could probably close it for now. 14:11:43 bnemec: ack, please do if you're happy it's gone, thanks! 14:12:05 Anyone else have bugs the want to highlight before we move on? 14:13:11 shardy me 14:13:42 related to HAproxy restarts on ControllerPrePuppet and ControllerPostPuppet 14:13:57 https://bugs.launchpad.net/tripleo/+bug/1640175 14:13:57 Launchpad bug 1640175 in tripleo "HAProxy doesn't load the new configuration never" [Undecided,In progress] - Assigned to Carlos Camacho (ccamacho) 14:14:13 just wanted to ask more info about if this can impact upgrades 14:14:55 bandini for HA and marios for upgrades please when having some free cycles. 14:15:10 shardy, these two are also CI impacting: https://bugs.launchpad.net/tripleo/+bug/1639885 https://bugs.launchpad.net/tripleo/+bug/1639970 14:15:10 Launchpad bug 1639885 in tripleo "CI: pingtest timeouts cause by performance issues (redis, swift, ceiliometer)" [High,Triaged] 14:15:11 Launchpad bug 1639970 in tripleo "CI: cinder fails to allocate memory while creating volume for ping test tenant" [Critical,Confirmed] 14:15:42 ccamacho: certainly looks like it may, can you please add more details to the bug then we can discuss further? 14:15:42 ccamacho: I am almost done with an escalation. happy to sync up in a bit? 14:16:04 shardy ack Ill add more details there 14:16:33 thanks bandini! 14:16:41 ccamacho: sorry, chasing BZ, reading back 14:17:19 Ok so bug #1639970 needs further investigation to see where/why we're using more memory 14:17:19 bug 1639970 in tripleo "CI: cinder fails to allocate memory while creating volume for ping test tenant" [Critical,Confirmed] https://launchpad.net/bugs/1639970 14:17:41 do we know if https://review.openstack.org/#/c/394548/ fixes bug 1639885 or if further work is needed? 14:17:41 bug 1639885 in tripleo "CI: pingtest timeouts cause by performance issues (redis, swift, ceiliometer)" [High,Triaged] https://launchpad.net/bugs/1639885 14:18:12 shardy, no, it doesn't fix it 14:18:24 shardy, the performance issue is still there 14:18:27 ccamacho: ok i guess you will add more info there ? seems very new do we have a BZ for that (we can take it offline after the meeting too). Haven't heard of someone hitting that yet but we should find out more (I mean for upgrades) 14:18:42 sshnaidm: Ok, can you please add more details, as "various performance issues" isn't that actionable 14:18:46 thanks :) 14:19:01 marios ack, https://bugzilla.redhat.com/show_bug.cgi?id=1390962 14:19:01 bugzilla.redhat.com bug 1390962 in rhel-osp-director "HAProxy doesn't load the new configuration after scaling out the role running the Openstack API services" [Urgent,Assigned] - Assigned to ccamacho 14:19:13 ccamacho: ty 14:19:19 launchpad bugs here please ;) 14:19:37 shardy upps 14:19:38 Ok then, any further bugs or shall we continue? 14:20:25 #topic Projects releases or stable backports 14:20:33 shardy: sorry my fault i asked for that. we do always link to LP from the BZ though where appropriate 14:20:49 Ok, two things to discuss here, slagle is planning a stable/newton release tomorrow 14:21:12 and we need to release ocata-1 next week (I'm happy to coordinate that unless anyone else wants to) 14:21:23 slagle: what's the status of the newton release, are we in good shape for tomorrow? 14:21:31 any outstanding backports need review attention? 14:21:47 I'd like to add this one (only to newton, not to master) https://review.openstack.org/394980 14:21:52 slagle, marios: ^ 14:22:26 bandini: thanks (for the galera issue looks like) 14:22:36 marios: totally not galera btw ;) 14:22:46 shardy: this is another one that was filed moments ago https://review.openstack.org/#/c/394968/ which we'll need into newton 14:23:34 marios: ack 14:23:36 bandini: k :D well the symptom was galera at least 14:23:42 bandini: will check the review thanks 14:23:49 This one as well, a missed parameter in the generated passwords: https://review.openstack.org/#/c/394493/ 14:24:08 marios: np ;) 14:25:09 shardy: another one here https://review.openstack.org/#/c/389830/ 14:25:23 shardy: everything for newton won't be merged by tomorrow 14:25:36 we can always do another release though 14:25:40 slagle: Yeah, I'm assuming we may need to do another one 14:25:54 but we can try to push any passing CI patches in during the rest of today I guess 14:25:57 ya releases are fairly inexpensive 14:26:21 yea so i'll request the release of what we've got tomorrow 14:26:43 Ok then sounds like the newton release is under control, thanks! 14:26:49 #link https://launchpad.net/tripleo/+milestone/ocata-1 14:26:57 168 bugs targeted :-O 14:27:29 I'm going to start deferring all bugs later this week to ocata-2 unless they're assigned and high/critical priority 14:27:51 we'll aim to cut the ocata-1 release next week (shall we say Wednesday again?) 14:28:04 wednesday seems good 14:28:14 sounds reasonable 14:28:14 jpich: what's the status of the tripleo-ui CI job? 14:28:26 https://blueprints.launchpad.net/tripleo/+spec/tripleo-ui-basic-ci-check 14:28:29 shardy: Last patch is ready for review 14:28:40 https://review.openstack.org/#/c/390845/ 14:28:51 Ok, thanks, lets see if we can get that solitary blueprint landed this week then ;) 14:28:56 :) 14:29:18 Should we use a gerrit topic again to help focus reviews? 14:29:38 e.g if folks have release blocker bugs, and they're targetted to ocata-1, tag the patches with tripleo/ocata1 ? 14:29:48 I found that helpful in the run-up to the newton release 14:29:57 +1 that is really helpful 14:30:15 should probably be tripleo/ocata-1 14:30:19 for consistency 14:31:21 Ok then, well lets do that, but FWIW I'll probably prefer deferring things to ocata-2 wherever possible given the huge number of outstanding bugs 14:32:28 Feel free to help by deferring bugs to ocata-2 if you think they can wait 14:32:58 I have been filing new bugs targeted at ocata-2 already 14:32:58 Anything else related to releases before we continue? 14:33:24 ++ yeah please don't target any new bugs to ocata-1 unless they're super critical 14:33:31 I'll only write a script that defers them ;) 14:34:28 #topic CI 14:34:45 Ok who wants to give an update of the current status of CI? 14:35:06 I know things are looking a log more green now, and we've talked about a few remaining CI impacting bugs 14:35:53 I'm interested to discuss how we can more effectively triage/assign CI related issues 14:36:24 stop all other work 14:36:34 :) 14:36:43 I personally would love if we could dedicate a deep dive to CI. I try to help but I am often a little confused by the whole CI topic ;) 14:36:52 slagle: so, that's one option - but is it efficient to have *everyone* stop to look at the same issue 14:37:06 sometimes there are multiple issues, and often there are critical bugs which sit unassigned 14:37:30 especially regarding the flow of fixes into rdo when the issue is not tht/tripleo specific 14:38:07 shardy: yea, i think it is more efficient 14:38:10 I personally think that as long as the CI is as complex as it is to debug, it's just waste of time to "everyone stop all other work" kind of approach 14:39:09 I think we need some way to avoid the same folks always fixing CI, but which doesn't result in a 50% efficiency hit on all development 14:39:19 CI regressions happen almost every day lately 14:39:38 yeah 14:40:06 we probably need to start going back and doing proper RCAs on blocking CI issues to see what happened 14:40:16 part of the problem is that the people who do work on CI are consumed with just the firedrills/reactions 14:40:19 the big ones have been packaging issues that caused not complete failures 14:40:37 so there is little time left to work on things like documenting it for others or making it less complex 14:40:52 that's one of the ways that "stopping other work" would make us more efficient 14:41:04 slagle: I agree, I'm just saying if 100% of the team are consumed by the same firedrills, that doesn't necesarily help 14:41:20 shardy: ++ 14:41:32 but it's defintely a problem we need to address 14:41:39 shardy: yea, not saying we need everyone looking at the same issue at the same time 14:42:17 just that if CI is down...and it's already being worked on, maybe take that as opportunity to look into how to avoid similar failures in the future 14:42:27 or improve something so that different people could help next time 14:42:51 or document the issue for a wider understanding, etc 14:42:54 slagle: cool, that was my initial interpretation of "stopping other work" - if we can figure out ways for more folks to help I'm 100% in favor of it of course 14:43:38 Alright, perhaps we can continue this discussion on the ML as we'll run out of time here 14:43:49 #topic specs 14:44:04 So Emilien was talking about observing a spec freeze starting next week 14:44:10 just a follow up from last week, the major underclodu upgrades job is merged and non-voting now 14:44:17 sorry, done with ci now :) 14:44:21 \o/ 14:44:26 one thin tht would be super helpful to perhaps get into the level being able to help (either on the firedrills or preventatively) would be some kind of summary of what caused it, how it was found and how it was fixed 14:45:05 We've got a bunch of open spec reviews, please help with reviews, and I think we should start landing things which look in good shape with at least a couple of +2s 14:45:26 slagle: good news :) 14:45:31 i have added a BP - https://blueprints.launchpad.net/tripleo/+spec/tuned-nfv-dpdk for DPDK performance, things are not clear, we are working with the performance team for it. but we dont want to miss the ocata cycle freeze. so raising a BP 14:45:53 I've written a couple of validation-related specs that could use attention: https://review.openstack.org/#/c/393281/ and https://review.openstack.org/#/c/393775/ 14:45:59 (they affect tripleo-common) 14:46:10 jokke_: yeah, I think that's what mwhahaha was suggesting with the RCA comment, in theory that info should be in the bug report, but often it isn't 14:46:49 #action everyone to review all-the-specs ahead of proposed spec freeze 14:47:06 shardy: we will have more clarity in the coming week. 14:47:15 review all-the-specs++ 14:47:37 Ok lets try to land as much as possible then re-assess in next weeks meeting 14:47:44 thanks all 14:47:57 #topic Open Discussion 14:48:27 12 minutes to discuss other things (or continue to debate CI if you wish ;) 14:48:30 I wanted to mention about bugs. it would be helpful if you spot something wrong, create a bug. you don't have to fix it but it allows other people a chance to work on something and possibly know there's an issue 14:48:40 +1 14:48:44 i've noticed many times people will create a bug right before proposing a patch 14:48:58 if at all :) 14:49:03 +1 that should already be happening in theory but a good reminder mwhahaha thanks 14:49:25 just a friendly reminder :) 14:49:31 :) 14:49:43 Anyone have anything else to raise? 14:49:43 Example: I opened an ipv6 bug yesterday and beagles fixed it before I could. :-) 14:49:51 I'd like to point that there's a bunch of open validation bugs free to take :-) 14:50:22 mwhahaha: +1 14:52:00 Ok, waiting 1 minute before declaring meeting complete, anything else before we wrap things up? 14:53:03 thanks all! 14:53:08 #endmeeting