16:01:46 #startmeeting openstack_ansible_meeting 16:01:47 Meeting started Tue Jul 23 16:01:46 2019 UTC and is due to finish in 60 minutes. The chair is mnaser. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:50 The meeting name has been set to 'openstack_ansible_meeting' 16:01:50 #topic rollcall 16:01:52 o/ 16:01:56 o/ 16:01:57 o/ 16:02:05 o/ 16:02:25 hows everyone?! 16:02:32 o/ 16:02:41 * guilhermesp fixes deployments 16:02:56 fun :) 16:03:04 #topic office hours 16:03:53 personally 16:03:56 https://review.opendev.org/#/c/671783/ 16:04:00 i love the -392 part :D 16:04:19 it is lot :P 16:04:29 So I was thinking about dropping os-log-dir-setup.yml and rsyslog client in terms of this 16:04:45 seems that we started the work of cleaning up things 16:05:19 yeah i think at this point rsyslog_client wont really do much if we never run it 16:05:26 i would even retire rsyslog_server unless omeone wants to maintain it 16:05:31 but since it's still used by ceph, unbound and things like tempest, rally and utility container... 16:05:44 utility needs rsyslog? o_O 16:06:03 not rsyslog but os-log-dir-setup.yml 16:06:08 ahhh ok 16:06:16 this is the thing that does bind mounts right? 16:06:17 I mixed up things a bit:( 16:06:26 yep 16:06:33 i was always wondering why we just didn't log things inside the container and kills all those complicated bind mounts 16:08:47 it probably historical so you can just collect up whatever is in /openstack/logs/* and not worry about which containers you have 16:08:54 probably for compressing without entering container 16:08:57 I thought it was so we could find everything in one place? 16:09:37 but since integrated test, which are actually metal ones, everything is already in one place 16:10:15 o/ 16:10:26 evrardjp!!!! 16:10:46 spotz: ! :) 16:11:02 mnaser: love the - in -392 16:12:46 o/ 16:12:48 mnaser: yes, historical. I would prefer removing all the complex bind mounts too, as this was a pain to deal with. Cleaning this up would also simplify the code further 16:13:45 there is something that collects container journals on the host anyway isnt there? 16:13:47 so, the thing is that we don't need that almost anywhere, ecept 3 playbooks 16:14:01 so moving to all journals this is now of little point to keep the bind mounts 16:14:32 yeah i agree with all of this so maybe it would be a good clean up 16:14:45 btw also knock on wood our CI has been relatively stable recently 16:14:51 im pretty happy on where it is 16:15:06 the upgrade jobs need work unfortunately 16:15:37 yeah. 16:15:44 we've decreased coverage though :( 16:16:07 if ppl want to help on increasing coverage, I have plenty of ideas, so little time. 16:16:17 evrardjp: ^ can you define what you mean by that, as there several different things 16:18:20 well I think the first thing to do is to match what we removed -- so define new "specific" jobs having a pre-run play configuring the o_u_c or user_variables, to get feature parity back 16:18:36 then I guess the idea would be to implement multi-node jobs in periodics 16:20:22 Yeah, since we might be missing system packages in roles 16:20:50 we won't catch this if they are already installed by previous role 16:21:01 would we get coverage again if we run both container and metal on every change? 16:21:12 i mean im not opposed to it but we need to figure out why centos takes stupid long with lxc 16:21:16 its like 2h20 to run aio 16:21:35 i would like to see haproxy back in some form 16:22:08 yes with the bind-to-mgmt stuff you're doing 16:22:10 it'll be running fine 16:22:19 the curent metal jobs are fast but they are not sufficiently real life 16:22:42 i'm a bit stuck on the galera stuff there 16:25:12 jrosser: its odd that it just works for us, i can try to help with looking that the fails 16:25:16 did you end up doig the my.cnf adjustments? 16:25:25 i did, and it's inconsistent 16:26:00 heres the change for the client my.cnf https://review.opendev.org/#/c/672101/ 16:26:24 jrosser: lol 16:26:26 you're gonna hhate me 16:26:40 jrosser: check the review :p 16:26:43 i figured i may have messed it up 16:27:13 yeah 16:27:14 aaaaaahhhhhh crap :) 16:27:42 thankyou :) 16:27:44 * jrosser fixes 16:28:09 Jonathan Rosser proposed openstack/openstack-ansible-galera_client master: Default to connecting clients via ip rather than local socket https://review.opendev.org/672101 16:28:22 so that should hopefully help 16:28:48 mnaser: it won't be enough, and I didn't intent to run centos+lxc :) 16:29:07 evrardjp: well i figured we'd run all the oeprating systems we cover 16:29:14 I just wanted to have like mariadb cluster testing + keystone and stop there 16:30:00 that probably can be wired up in logic 16:30:09 we already have all we need 16:30:17 ye sbut writing the 'dependency' system 16:30:19 just change affinity 16:30:35 what do you mean? 16:31:08 when testing os_keystone then grab what services need os_keystone (i.e. galera memcache etc) 16:31:13 the idea to not run centos+lxc was just to have scenarios (ubuntu +lxc is the most frequent one) 16:31:38 when testing os_nova then grab its dependnecies which are os_neutron,os_keystone,os_glance 16:31:45 whos dependencies are.. etc 16:31:46 to reduce our run times 16:31:47 that's kinda what's done in tests/role/bootstrap 16:32:03 but I understand it would be smarter if we do it that way 16:32:17 I thought this could be done using a CLI :) 16:32:21 mhmm 16:32:31 i think we should have full coverage, if we do aio_lxc, we do it for all supproted systems 16:32:38 just encapsulate the logic there, instead of relying on so many conditionals in ansible 16:32:41 imho we either do it or drop it 16:32:56 or otherwise we'll get a change that breaks ar ole for centos that wont be caught there 16:33:02 and then itll be broken in integrated 16:33:45 I meant to keep centos for baremetal, so it would still be tested. Just keeping the most common use cases 16:33:59 but I understand your point 16:34:38 i feel like with a little bit more efort we'd understand the fundamnetal reason of why centos is sooooo slow 16:34:47 i think we've regressed something 16:34:50 it never took this long before 16:35:22 the data is all there in the ARA reports / db 16:35:38 to decide if its specific things that are slow, or it's just across the board 16:35:44 across the board 16:35:46 every operation is slower 16:35:49 like 3-4x slower 16:35:54 even simple things 16:36:09 does that still stand outside CI? 16:36:12 Our connection module affects? 16:36:36 i havent tried outside CI, i thougth about our connection moduel but figured it would regress in both OSes? 16:36:54 o/ 16:36:59 bonjour 16:37:39 i've been talking on IRC for like, at least a week, and just realized nothing was going thru 16:37:45 lolol 16:37:52 my feelings were hurt for a bit 16:37:55 lol 16:38:34 yeah i dont know 16:38:39 for the centos stuff 16:38:44 it deffo needs some profiling 16:39:01 it'd be nice to get to nspawn and not have to deal with that but 16:39:29 we bit too much off there in one go 16:39:43 yea macvlan+nspawn together is a lot 16:39:47 nspawn + macvlan is too much 16:39:48 yes 16:40:00 bridge+nspawn is easier to cosnume but i dunno if i have the current cycles to help with it 16:40:25 i think there may be (was?) a limitation with nspawn and the number of interfaces you could create 16:41:30 evrardjp: did you do some work on ansible profiling? 16:41:39 long ago 16:41:44 dw is better :D 16:41:50 i was just looking for dw but he's not in #mitogen 16:42:07 don't want to waste a bunch of time learning 10 wrong tools when someon can just say "do this" 16:42:18 that reminds me I need to connect on that channel since I reconfigured my bouncer 16:42:24 jrosser: totally 16:42:35 just wait for him, last time he was super helpful to me 16:42:53 maybe ping him on twitter? 16:45:43 done 16:46:47 cloudnull: jrosser needs +w on this https://review.opendev.org/#/c/672225/ 16:46:47 so we'll keep improving and cleaning up things, journald seems to be cleaning up well 16:47:18 chandankumar done 16:47:25 cloudnull: thanks :-) 16:47:59 OH 16:48:01 also 16:48:06 did y'all see my email to the ML 16:48:10 about openstack-ansible-collab 16:48:52 hectic days, just saw an email around, but I will take a look 16:50:24 Just a heads up, but the unicast flood issue that was brought up last week is related to change in os-vif introduced in Stein. See: https://bugs.launchpad.net/os-vif/+bug/1837252. 16:50:24 Launchpad bug 1837252 in neutron "IFLA_BR_AGEING_TIME of 0 causes flooding across bridges" [Undecided,Incomplete] 16:50:27 chandankumar: did you find a solution with your tempest undefined var? 16:51:15 jamesdenton: affecting lxb only? 16:51:31 jrosser: above changes worked https://review.opendev.org/#/c/672231/ here, But I need to come up with a better solution 16:51:33 it affects the qbr bridges, too, with OVS. Just not sure what the overall effect is there 16:51:56 it might happening due to mixing of venv and distro stuff 16:52:37 jrosser: sorry wrong review 16:52:58 jrosser: https://review.opendev.org/#/c/667219/ 16:57:30 evrardjp: word according to dw "'perf record -g ansible-playbook ...' of the ansible run /and/ separately on the host using simply 'perf record' followed by 'perf report' might show something obvious" 16:57:51 namrata: did you want to ask about upgrades? 16:58:08 yeah I was waiting for open discussion 16:58:22 jrosser: oh yeah that rings me a bell :) 16:58:31 Hi would like to contribute to openstack ansible and I can start with the issue which I faced while upgrading R->S i.e upgrading WSREP SST method from xtrabackup-v2 to mariabackup. 16:58:31 namrata: just ask :) 16:59:02 jrosser: could it be the fact we just added all those plays, and maybe there is cruft in the inventory? 16:59:05 I haven't checked tbh 16:59:29 jrosser suggested to take this up in the meeting so we can discuss how to handle it 16:59:36 mnaser: ^ so for fixing up the R-S upgrade for galera, do we make patches to master? i wasnt totally clear where we do that 16:59:42 evrardjp: ^ maybe you can advise too 17:00:11 well, is S-master broken too? 17:00:28 if you start on S you are already on the new replication method 17:00:33 ok 17:00:55 so it's S upgrade only, so it's only implementable in stein 17:01:07 you got your answer ? :p 17:01:30 i guess it's made more complicated by not having a working R-S upgrade CI job :/ 17:01:38 well, unless we do, of course 17:02:07 yeah stable only patch 17:02:09 okay so I should push to stable/stein then 17:02:55 namrata: ok so that sounds like your answer, write something that goes onto stable/stein 17:03:08 jrosser thanks 17:03:10 and thanks for taking the time to fix it up :) 17:03:28 :] 17:15:41 #endmeeting