16:01:54 #startmeeting fuel 16:01:55 Meeting started Thu Aug 21 16:01:54 2014 UTC and is due to finish in 60 minutes. The chair is mihgen. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:56 hi 16:01:56 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:56 hi 16:01:57 hi 16:01:59 The meeting name has been set to 'fuel' 16:02:07 hey 16:02:22 today vkozhukalov is on vacation so I have to run all these commands ) 16:02:33 who else is here? 16:02:37 hi 16:02:38 o/ 16:02:40 hi 16:02:40 hhi 16:02:40 o/ 16:02:45 hi 16:02:45 * dilyin here 16:02:49 hi 16:02:59 good 16:03:03 hi 16:03:05 hi 16:03:09 let's go over agenda 16:03:13 #link https://etherpad.openstack.org/p/fuel-weekly-meeting-agenda 16:03:20 hi 16:03:23 #topic 5.1 release status (Fuel, mos-openstack, mos-linux) 16:03:55 so folks we still have bugs, yes, and still can't meet HCF criteria 16:04:29 for Fuel we have Galera issues, and many patching issues, as well as a few more other things 16:05:08 I think I would pass a voice to folks to talk about Fuel issues first, and then we switch to mos-openstack/mos-linux 16:05:19 #topic Galera issues 16:05:33 holser: please provide us with latest status on this 16:05:49 #link https://bugs.launchpad.net/bugs/1354479 16:05:50 Launchpad bug 1354479 in fuel "Galera is not syncing on the slaves sometimes" [Critical,In progress] 16:06:12 #link https://bugs.launchpad.net/bugs/1355162 16:06:13 mihgen: Finally I found the issue with Galera 16:06:16 Launchpad bug 1355162 in fuel "[library] MySQL Galera is not operable after controllers hard reset" [High,Confirmed] 16:06:38 They were caused by high memory consumption caused high swap/in/swap out 16:06:53 is it the only reason? 16:07:14 I made a review to switch from mysqldump to xtrabackup and slightly decreased RAM 16:07:19 and that helped 16:07:38 so now I am making BVT tests with my patch 16:07:59 holser: do we need to increase ram size for BVT/Fuel CI jobs? 16:08:06 only that, but xtrabackup helps as it’s much faster so it’s enough not to timout deployment 16:08:13 mihgen: we do 16:08:33 teran_: did you agreed on that? ^^ 16:08:48 as I showed that comlex deployments with neutron+gre cconsumes up to 3GB RAM 16:09:28 ps ax -O rss | awk '{ sum += $2 } END {print "Total memory usage =", sum/1024, "MB"}' 16:09:29 Total memory usage = 2691.69 MB 16:09:44 mihgen: we've already increase ram size for tests to 2.5gb 16:09:51 That’s after deployment, as during deployment it was 3Gb 16:09:58 xarses also mentioned before that we need 2 vCPUs on test VMs 16:10:11 with one vCPU, there's too much context switching 16:10:16 2 vCPU should help also 16:10:32 nurla: so should we get 3gb instead and 2 vcpu? 16:10:54 at least for now before we improve consumption if possibl 16:11:15 mihgen: I'd prefer we start to use xtrabackup, I had expirience with that tool it's faster the mysqldump 16:11:29 one more thing from xarses (he's having trouble getting online): we should enable KSM on jenkins slaves 16:11:38 teran_: we will but as I understood we still need 3gb 16:11:40 teran_: +1 https://review.openstack.org/#/c/109606/ 16:12:04 this requirements will be affect vbox users 16:12:16 also there was suggestion xdeller in mailing list about in-memory compression 16:12:24 might be we need to consider that as well 16:12:43 mihgen: yup, that's what KSM is for, RAM deduplication 16:12:44 holser: it's actually great that you nailed that down 16:13:12 SKM can be very CPU heavy 16:13:20 s/SKM/KSM/ 16:13:21 teran_: in your todo list to take a look on this? 16:13:29 it might inpact performance 16:13:55 we have it enabled on the dev server in MTV, it's not too bad wrt CPU 16:13:59 I’d use huge pages instead for our libvirt virtual instances 16:14:02 may be devops team can try it and see how it works 16:14:08 mihgen: about 3g - currently it's possible but it could make us close eyes on sime performance issues 16:14:09 +1 hugepages 16:14:37 holser: what's the decision, 3gb or 2.5 is ok? 16:14:44 althoug we could have better gain with KSM, worth checking 16:14:47 2.5 should be ok 16:14:58 let’s leave 2.5GB for now 16:15:03 +1 16:15:06 ok. cool 16:15:22 let's move on 16:15:33 #topic Patching status 16:15:34 but question with 2 vCPU is still open 16:15:45 teran_: pls consider 2 vcpu 16:15:48 mihgen: I saw offer obout RAM compression - that should help, so yeah it's in my todo :) 16:15:57 kk :) 16:16:11 patching shows new areas of issues 16:16:16 mihgen: ok 16:17:05 deductively we've discovered that python deps of openstack are not gonna be updated during pathing 16:17:20 so basically oslo.messaging is not updated on node 16:17:37 just because puppet doesn't know that it should update this package 16:17:43 and it's not updated by deps tree 16:17:54 dilyin: what's our action plan on this? 16:18:24 ikalnitsky: any more news on other issues related to patching? 16:18:28 we have decided to add missing package installation to Fuel together with service notification 16:18:29 that's what we get for using puppet instead of apt/yum to manage dependencies 16:18:56 mihgen: yep. here's the status 16:19:04 bogdando have made a good patch to integrate oslo.messaging class to controlelr and compute 16:19:08 dilyin: provide a link pls 16:19:09 The ceilometer issue: 16:19:09 #link https://bugs.launchpad.net/fuel/+bug/1354494 16:19:09 the fix is merged, but afaik QA still have some troubles with ceilometer. Need to investigate. 16:19:10 Launchpad bug 1354494 in fuel/5.1.x "Puppet fails during updating ceilometer node" [Critical,Fix committed] 16:19:17 The murano dashboard issue: 16:19:18 #link https://bugs.launchpad.net/fuel/+bug/1355180 16:19:18 The patch is already done, but we don't test it yet. 16:19:19 Launchpad bug 1355180 in fuel/6.0.x "rollback will fail - migration rollback is impossible - need to backup databases" [High,Confirmed] 16:19:23 We finally resolve the issue with Ubuntu HA. 16:19:23 #link https://bugs.launchpad.net/fuel/+bug/1356873 16:19:23 the fix isn't merged yet, but I made some tests and it works. 16:19:24 Launchpad bug 1356873 in fuel "[Update] Patching on Ubuntu nova ha failed with Unmet dependencies" [Critical,In progress] 16:19:33 As for this 16:19:34 #link https://bugs.launchpad.net/fuel/+bug/1355180 16:19:34 it was confirmed from the mos side that there're no migrations between 5.0 and 5.0.2. So we can target it for 6.0 and won't fix in 5.1. 16:19:43 https://review.openstack.org/#/c/116011/ 16:19:46 that's all :) 16:20:02 ok, thanks 16:20:24 I'm wondering on the current testing coverage, how deep did we go, what else to expect 16:20:29 nurla: tatyana ^^ 16:21:05 angdraug, actually we use both puppet and apt/yum and collect troubles from both sides) 16:21:06 nurla: not so deep aswe want - we need to improve our test 16:21:21 to verify if all deps are updated 16:21:50 also we should covered vcenter and cli 16:22:11 and meausre downtimes 16:22:13 ok, what about other stuff, like nova-network, murano/sahara 16:22:34 covered 16:22:45 tatyana: oh yes, we need to know what happens under real load by rally and patching at the same time 16:22:48 tatyana: good 16:22:54 we need to run destructive tess as well on patched ha 16:23:12 test 16:23:45 today we extend our swarm for run patching against all tests 16:23:45 ok if we turn off primary controller and run patching ) 16:24:01 mihgen: O_0 16:24:20 mihgen: don't do it, i'm scared 16:24:22 and after shutdown cluster too) 16:24:35 ohh. too many cases 16:24:50 all right, anything else on patching? 16:24:58 dpyzhov: what about size of tar balls? 16:25:19 sorry it was not me, but my IRC client ) 16:25:28 Well, we can win 2Gb with lrzip 16:25:40 but it takes 15 minutes to unpack 16:26:09 so, a wash 16:26:09 dpyzhov: what about using lrzip only on final tarball ? 16:26:10 I would say better to make it smaller at the expense of time (consider you'll also save time downloading the upgrade package) 16:26:11 I’m playing with fdupes in order to add hardlinks into tarball 16:26:13 it's possible to provide only incremental repo updates. it will make update tarballs many times smaller. but we have no facilities to generate such diff files yet 16:26:49 incremental updates are out of scope for 5.1 16:26:58 It’s too dangerous 16:27:05 We will get new bugs 16:27:05 yes fdupes can hardlink same file very well. it will free a lot of space on master node but will not help with update tarball size 16:27:10 yep let's be realistic for 5.1 16:27:24 dilyin: we have duplicates in tarball 16:27:28 if we can do hardlinks way in 5.1, that would be awesome 16:27:36 because we have two centos repos and two ubuntu repos 16:28:04 ok so dpyzhov - you are trying to get hardlinks working for us? 16:28:19 But looks like lrzip still better than fdupes 16:28:30 mihgen: yes 16:28:34 they are better together 16:28:44 dilyin: looks like not 16:28:58 ok let's move on folks 16:29:14 #topic remove public IPs from slave nodes 16:29:27 Latest ISO is http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_master_iso/65/ 16:29:34 Option "Assign public network to all nodes" is on the Settings tab in UI and it is disabled by default. I.e. only "controller" and "zabbiz-server" nodes will have Public network by default. 16:29:42 Both HA and non-HA deployments are under testing. 16:29:46 there were a lot of complaints about having public IPs assigned to all nodes, even when not needed 16:29:53 Current problems are: nodes' default gateways point to the master node's IP (Sergey V. to figure out what to do with that), issues with ubuntu/neutron-vlan/ceph deployments (https://bugs.launchpad.net/fuel/+bug/1359834 - not yet reproduced on the last ISO) 16:29:54 Launchpad bug 1359834 in fuel "[custom iso] Ceph volumes don't work on ubuntu/neutron-vlan" [High,Incomplete] 16:30:14 xenolog: did you figure out what to do with default gateway? 16:30:36 akasatkin: cinder is fixed? it was using public net (wrongly) 16:30:48 yes, fixed 16:31:01 ok good 16:31:25 is this feature a risk for 5.1 schedule? 16:31:28 ceph also works on last ISO. we just started with this configuration on 65 16:31:52 angdraug: should be actually done 16:32:09 but didn't check other cfg's on last ISO yet 16:32:12 for Ceph we still need to merge https://review.openstack.org/115728 16:32:31 otherwise it will raise HEALTH_WARN on every deployment 16:32:57 I can propose two way: 16:32:57 1. simple — set default gateway to the master node or to another node, that customer can define 16:32:57 2. more_powerful — make virtual router, based on controller's and managed by pacemaker. 16:33:59 1-st way required lot of changes in nailgun/fuelweb 16:34:30 why a lot of changes? should not be so 16:34:33 in 1st 16:34:39 2-nd way —required only puppet-part changes 16:35:20 1st way should be a way simpler. but we need public IPs gateway for nova-network still 16:35:30 ok let's take it out of the meeting 16:35:39 and discuss in #fuel-dev or ML if needed 16:35:54 I hope it can be quickly done, otherwise we will have to go back with all of this 16:36:03 xenolog: working with multiple-cluster networks, i found that l23network will set what ever its passed, we just need to fix nailgun to pass the network you want for default 16:36:09 ok I think we should move on 16:36:17 we also need to get a gateway router from the user on the other network 16:36:22 mihgen: 16:36:22 1. field for IP address definition in fuel-web 16:36:22 2. network checker, that monitor, that defined router — is a router 16:36:22 3. new variables in astute.yaml 16:37:01 xenolog: ohh 16:37:03 ok 16:37:13 let's move on 16:37:23 #topic mos-openstack bugs 16:37:32 dmitryme: around to provide status? 16:37:46 2-nd way looks like more relaible 16:37:51 mihgen: yep, I am here 16:38:10 xenolog: yeah but seems complicated 16:38:20 because don't required from curtomer infrastructure 16:38:24 basically we have two critical bugs left: fixing keystone and memcache interaction 16:38:46 and fixing oslo.messaging <-> rabbitmq interaction 16:38:59 any progress/estimates on both? 16:39:03 for the first bug it is in the process of debugging 16:39:15 I hope 1-2 days 16:39:20 mihgen: more complicated in implementing, but simpleest in using. 16:39:32 I guest the first way is better. Just set route to the IP provided by user. If master node goes down it's not a big problem. Internet connection from computes is not strictly required 16:39:46 as for the second one, I’ve already merged the old fix we used in 5.0.1 and right now are testing more proper fix 16:39:46 dilyin: +1 16:39:59 xenolog: dilyin but looks like we need to talk about it over email :) 16:40:11 s/I’ve/we’ve/ 16:40:14 dmitryme: ok thanks 16:40:18 it wasn’t me :-) 16:40:20 dmitryme: I hope it's gonna be soon too.. 16:40:36 anything else? 16:40:49 mihgen: I will discuss it, probably the old fix is good enough for 5.1 16:40:56 dilyin: 16:40:56 > If master node goes down it's not a big problem 16:40:56 master node CAN go down — it's i bigest problem!!! 16:41:02 mihgen: nope, nothing else 16:41:22 because we have min.3 controller and only one master node. 16:41:33 xenolog, why? what importnat services are left there? 16:41:44 xenolog: we have no deps on master node, only NTP and partly DNS 16:41:46 topic is mos-openstack? 16:41:54 and that's what we must fix, hopefully in 6.0 16:41:58 should be easy 16:42:02 dmitryme: thanks 16:42:08 moving on 16:42:14 o, I think I should have mentioned issue which should be fixed by sqlalchemy upgrade, it has ‘high’ status, the fix is under review right now 16:42:26 link? 16:42:35 dmitryme: ok 16:42:36 if something can иу икщлут - it is when something breaks. 16:42:40 #topic mos-linux bugs 16:42:53 s/иу икщлут/be broken/g 16:43:00 sed 16:43:31 msemenov: around? 16:43:51 any status on rabbitmq upgrade, iptables.. ? 16:44:09 mihgen: here 16:44:10 #link https://bugs.launchpad.net/fuel/+bug/1359096 16:44:12 Launchpad bug 1359096 in fuel "Build iptables 1.4.11 for centos" [High,Confirmed] 16:44:27 we also still need yet another ceph update: 16:44:32 #link https://bugs.launchpad.net/fuel/+bug/1341009 16:44:35 Launchpad bug 1341009 in fuel/5.0.x "[osci] obsolete ceph package in fuel-repository for 5.0.1 and 5.1" [Critical,Fix released] 16:44:41 DNS — this is not enough. 16:44:42 request https://gerrit.mirantis.com/#/c/21088 16:44:53 we have +1 from Evgeny Li 16:45:00 so it seems we can merge 16:45:56 angdraug: is it that critical with ceph? 16:46:04 was the python issue resolved? 16:46:21 I'm afraid a bit of any pkgs upgrades if possible 16:46:28 xarses: what python issue? 16:46:30 mihgen: it's not critical but it's high priority 16:46:37 may be this one? https://bugs.launchpad.net/mos/+bug/1342068 16:46:38 Launchpad bug 1342068 in mos "syslog logging to /dev/log race condition" [Critical,Fix committed] 16:46:39 angdraug: even high.. 16:46:42 there's upstart and radosgw related fixes in that version 16:46:50 msemenov: correct 16:47:07 xarses: not reproduced with the fix. So moved to fix committed 16:47:22 if we're close to HCF I guess we can stay with ceph 0.80.4, but upstream highly recommends an upgrade 16:47:34 msemenov: ok, it should be in today's ISO? 16:47:52 xarses: sure 16:47:53 nurla: do we have confirmation that python /dev/log problem is solved now? 16:48:14 msemenov: ok, I will retest my case that caused it to occur all the time 16:48:14 xarses: yes, patched python packages have hit the 5.1 mirrors yesterday 16:48:32 angdraug: no, we havn't 16:48:54 angdraug: I would skip it if possible, too many things, and we have 3 QA going on vacation next week 16:49:01 because nova and neutron issues blocked us 16:49:08 mihgen: ok :( 16:49:26 as from the conversation with D.Borodaenko, there should be 100% cpu load for services writing to .dev.log 16:49:51 after restarting rsyslog many times(even during deployment) 16:49:51 if you restart syslog while they are writing 16:50:00 and we dont see it 16:50:37 we don't have an ISO from today that would have passed centos bvt, why? 16:50:38 We will retest today and mark it fix released if i can't see it anymore 16:51:47 xarses: and if the bug is still here, please provide deatiled repro steps in the issue 16:51:58 yes 16:52:25 nurla: any comment on centos bvt? 16:52:46 nurla: we need bug about it. every build hangs 16:52:55 ok 16:53:16 ok anything else to bring with mos-linux? 16:53:19 at first look, issue with galera 16:53:50 link? 16:54:07 #topic 6.0 plans and beyond 16:54:30 so we've discussed it a bit previously, just repeating that the main goal is to get Juno working 16:55:00 the thing that we still don't test openstack master with current puppet manifests puts us under risk releasing before design summit 16:55:14 so we should start doing that ASAP and collaborate across teasm 16:55:29 we can run a few things in parallel 16:55:47 for now though anyone who can help with reaching HCF should do that. 16:56:10 that's it from my side. any questions/suggestions? 16:56:56 actually forgot about 5.0.2 16:57:04 #topic 5.0.2 milestone 16:57:07 #link https://launchpad.net/fuel/+milestone/5.0.2 16:57:27 there are a number of bugs over there. We must keep watching them too 16:57:49 should be like 95% of back ports to stable branch 16:58:13 #topic other questions 16:59:41 Last week I missed a chance to answer question about an open blueprint status. 16:59:44 looks like no questions except some we ran in parallel in the office) 16:59:45 Added in progress and todo work items to the blueprint (https://blueprints.launchpad.net/fuel/+spec/fuel-web-docs-dev-env-restructure) 16:59:48 Will change Fuel pages on OpenStack wiki today/tomorrow. First just re-ordering and improving the organization. Then I'll go through and add/update content over the next few days. 16:59:51 christopheraedo: oh yeah 17:00:12 christopheraedo: very good 17:00:13 time 17:00:13 thanks 17:00:20 ok guys thanks 17:00:26 see you next meeting 17:00:32 bb 17:00:36 #endmeeting