#openstack-meeting-alt log

16:01:54 <mihgen> #startmeeting fuel
16:01:55 <openstack> Meeting started Thu Aug 21 16:01:54 2014 UTC and is due to finish in 60 minutes.  The chair is mihgen. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:56 <tatyana> hi
16:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:56 <akislitsky> hi
16:01:57 <meow-nofer> hi
16:01:59 <openstack> The meeting name has been set to 'fuel'
16:02:07 <mihgen> hey
16:02:22 <mihgen> today vkozhukalov is on vacation so I have to run all these commands )
16:02:33 <mihgen> who else is here?
16:02:37 <dpyzhov> hi
16:02:38 <angdraug> o/
16:02:40 <rmoe> hi
16:02:40 <sambork> hhi
16:02:40 <ikalnitsky> o/
16:02:45 <christopheraedo> hi
16:02:45 * dilyin here
16:02:49 <msemenov> hi
16:02:59 <mihgen> good
16:03:03 <asyriy> hi
16:03:05 <vkramskikh> hi
16:03:09 <mihgen> let's go over agenda
16:03:13 <mihgen> #link https://etherpad.openstack.org/p/fuel-weekly-meeting-agenda
16:03:20 <agordeev> hi
16:03:23 <mihgen> #topic 5.1 release status (Fuel, mos-openstack, mos-linux)
16:03:55 <mihgen> so folks we still have bugs, yes, and still can't meet HCF criteria
16:04:29 <mihgen> for Fuel we have Galera issues, and many patching issues, as well as a few more other things
16:05:08 <mihgen> I think I would pass a voice to folks to talk about Fuel issues first, and then we switch to mos-openstack/mos-linux
16:05:19 <mihgen> #topic Galera issues
16:05:33 <mihgen> holser: please provide us with latest status on this
16:05:49 <mihgen> #link https://bugs.launchpad.net/bugs/1354479
16:05:50 <uvirtbot> Launchpad bug 1354479 in fuel "Galera is not syncing on the slaves sometimes" [Critical,In progress]
16:06:12 <mihgen> #link https://bugs.launchpad.net/bugs/1355162
16:06:13 <holser> mihgen: Finally I found the issue with Galera
16:06:16 <uvirtbot> Launchpad bug 1355162 in fuel "[library] MySQL Galera is not operable after controllers hard reset" [High,Confirmed]
16:06:38 <holser> They were caused by high memory consumption caused high swap/in/swap out
16:06:53 <mihgen> is it the only reason?
16:07:14 <holser> I made a review to switch from mysqldump to xtrabackup and slightly decreased RAM
16:07:19 <holser> and that helped
16:07:38 <holser> so now I am making BVT tests with my patch
16:07:59 <mihgen> holser: do we need to increase ram size for BVT/Fuel CI jobs?
16:08:06 <holser> only that, but xtrabackup helps as it’s much faster so it’s enough not to timout deployment
16:08:13 <holser> mihgen: we do
16:08:33 <mihgen> teran_: did you agreed on that? ^^
16:08:48 <holser> as I showed that comlex deployments with neutron+gre cconsumes up to 3GB RAM
16:09:28 <holser> ps ax -O rss | awk '{ sum += $2 } END {print "Total memory usage =", sum/1024, "MB"}'
16:09:29 <holser> Total memory usage = 2691.69 MB
16:09:44 <nurla> mihgen: we've already increase ram size for tests to 2.5gb
16:09:51 <holser> That’s after deployment, as during deployment it was 3Gb
16:09:58 <angdraug> xarses also mentioned before that we need 2 vCPUs on test VMs
16:10:11 <angdraug> with one vCPU, there's too much context switching
16:10:16 <holser> 2 vCPU should help also
16:10:32 <mihgen> nurla: so should we get 3gb instead and 2 vcpu?
16:10:54 <mihgen> at least for now before we improve consumption if possibl
16:11:15 <teran_> mihgen: I'd prefer we start to use xtrabackup, I had expirience with that tool it's faster the mysqldump
16:11:29 <angdraug> one more thing from xarses (he's having trouble getting online): we should enable KSM on jenkins slaves
16:11:38 <mihgen> teran_: we will but as I understood we still need 3gb
16:11:40 <holser> teran_: +1 https://review.openstack.org/#/c/109606/
16:12:04 <nurla> this requirements will be affect vbox users
16:12:16 <mihgen> also there was suggestion xdeller in mailing list about in-memory compression
16:12:24 <mihgen> might be we need to consider that as well
16:12:43 <angdraug> mihgen: yup, that's what KSM is for, RAM deduplication
16:12:44 <mihgen> holser: it's actually great that you nailed that down
16:13:12 <tzn> SKM can be very CPU heavy
16:13:20 <tzn> s/SKM/KSM/
16:13:21 <mihgen> teran_: in your todo list to take a look on this?
16:13:29 <tzn> it might inpact performance
16:13:55 <angdraug> we have it enabled on the dev server in MTV, it's not too bad wrt CPU
16:13:59 <holser> I’d use huge pages instead for our libvirt virtual instances
16:14:02 <mihgen> may be devops team can try it and see how it works
16:14:08 <teran_> mihgen: about 3g - currently it's possible but it could make us close eyes on sime performance issues
16:14:09 <tzn> +1 hugepages
16:14:37 <mihgen> holser: what's the decision, 3gb or 2.5 is ok?
16:14:44 <tzn> althoug we could have better gain with KSM, worth checking
16:14:47 <holser> 2.5 should be ok
16:14:58 <holser> let’s leave 2.5GB for now
16:15:03 <nurla> +1
16:15:06 <mihgen> ok. cool
16:15:22 <mihgen> let's move on
16:15:33 <mihgen> #topic Patching status
16:15:34 <holser> but question with 2 vCPU is still open
16:15:45 <mihgen> teran_: pls consider 2 vcpu
16:15:48 <teran_> mihgen: I saw offer obout RAM compression - that should help, so yeah it's in my todo :)
16:15:57 <mihgen> kk :)
16:16:11 <mihgen> patching shows new areas of issues
16:16:16 <teran_> mihgen: ok
16:17:05 <mihgen> deductively we've discovered that python deps of openstack are not gonna be updated during pathing
16:17:20 <mihgen> so basically oslo.messaging is not updated on node
16:17:37 <mihgen> just because puppet doesn't know that it should update this package
16:17:43 <mihgen> and it's not updated by deps tree
16:17:54 <mihgen> dilyin: what's our action plan on this?
16:18:24 <mihgen> ikalnitsky: any more news on other issues related to patching?
16:18:28 <dilyin> we have decided to add missing package installation to Fuel together with service notification
16:18:29 <angdraug> that's what we get for using puppet instead of apt/yum to manage dependencies
16:18:56 <ikalnitsky> mihgen: yep. here's the status
16:19:04 <dilyin> bogdando have made a good patch to integrate oslo.messaging class to controlelr and compute
16:19:08 <mihgen> dilyin: provide a link pls
16:19:09 <ikalnitsky> The ceilometer issue:
16:19:09 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1354494
16:19:09 <ikalnitsky> the fix is merged, but afaik QA still have some troubles with ceilometer. Need to investigate.
16:19:10 <uvirtbot> Launchpad bug 1354494 in fuel/5.1.x "Puppet fails during updating ceilometer node" [Critical,Fix committed]
16:19:17 <ikalnitsky> The murano dashboard issue:
16:19:18 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1355180
16:19:18 <ikalnitsky> The patch is already done, but we don't test it yet.
16:19:19 <uvirtbot> Launchpad bug 1355180 in fuel/6.0.x "rollback will fail - migration rollback is impossible - need to backup databases" [High,Confirmed]
16:19:23 <ikalnitsky> We finally resolve the issue with Ubuntu HA.
16:19:23 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1356873
16:19:23 <ikalnitsky> the fix isn't merged yet, but I made some tests and it works.
16:19:24 <uvirtbot> Launchpad bug 1356873 in fuel "[Update] Patching on Ubuntu nova ha failed with Unmet dependencies" [Critical,In progress]
16:19:33 <ikalnitsky> As for this
16:19:34 <ikalnitsky> #link https://bugs.launchpad.net/fuel/+bug/1355180
16:19:34 <ikalnitsky> it was confirmed from the mos side that there're no migrations between 5.0 and 5.0.2. So we can target it for 6.0 and won't fix in 5.1.
16:19:43 <dilyin> https://review.openstack.org/#/c/116011/
16:19:46 <ikalnitsky> that's all :)
16:20:02 <mihgen> ok, thanks
16:20:24 <mihgen> I'm wondering on the current testing coverage, how deep did we go, what else to expect
16:20:29 <mihgen> nurla: tatyana ^^
16:21:05 <dilyin> angdraug, actually we use both puppet and apt/yum and collect troubles from both sides)
16:21:06 <tatyana> nurla: not so deep aswe want - we need to improve our test
16:21:21 <tatyana> to verify if all deps are updated
16:21:50 <nurla> also we should covered vcenter and cli
16:22:11 <tatyana> and meausre downtimes
16:22:13 <mihgen> ok, what about other stuff, like nova-network, murano/sahara
16:22:34 <tatyana> covered
16:22:45 <mihgen> tatyana: oh yes, we need to know what happens under real load by rally and patching at the same time
16:22:48 <mihgen> tatyana: good
16:22:54 <tatyana> we need to run destructive tess as well on patched ha
16:23:12 <tatyana> test
16:23:45 <nurla> today we extend our swarm for run patching against all tests
16:23:45 <mihgen> ok if we turn off primary controller and run patching )
16:24:01 <nurla> mihgen: O_0
16:24:20 <ikalnitsky> mihgen: don't do it, i'm scared
16:24:22 <nurla> and after shutdown cluster too)
16:24:35 <mihgen> ohh. too many cases
16:24:50 <mihgen> all right, anything else on patching?
16:24:58 <mihgen> dpyzhov: what about size of tar balls?
16:25:19 <mihgen> sorry it was not me, but my IRC client )
16:25:28 <dpyzhov> Well, we can win 2Gb with lrzip
16:25:40 <dpyzhov> but it takes 15 minutes to unpack
16:26:09 <xarses> so, a wash
16:26:09 <ikalnitsky> dpyzhov: what about using lrzip only on final tarball ?
16:26:10 <christopheraedo> I would say better to make it smaller at the expense of time (consider you'll also save time downloading the upgrade package)
16:26:11 <dpyzhov> I’m playing with fdupes in order to add hardlinks into tarball
16:26:13 <dilyin> it's possible to provide only incremental repo updates. it will make update tarballs many times smaller. but we have no facilities to generate such diff files yet
16:26:49 <dpyzhov> incremental updates are out of scope for 5.1
16:26:58 <dpyzhov> It’s too dangerous
16:27:05 <dpyzhov> We will get new bugs
16:27:05 <dilyin> yes fdupes can hardlink same file very well. it will free a lot of space on master node but will not help with update tarball size
16:27:10 <mihgen> yep let's be realistic for 5.1
16:27:24 <dpyzhov> dilyin: we have duplicates in tarball
16:27:28 <mihgen> if we can do hardlinks way in 5.1, that would be awesome
16:27:36 <dpyzhov> because we have two centos repos and two ubuntu repos
16:28:04 <mihgen> ok so dpyzhov - you are trying to get hardlinks working for us?
16:28:19 <dpyzhov> But looks like lrzip still better than fdupes
16:28:30 <dpyzhov> mihgen: yes
16:28:34 <dilyin> they are better together
16:28:44 <dpyzhov> dilyin: looks like not
16:28:58 <mihgen> ok let's move on folks
16:29:14 <mihgen> #topic remove public IPs from slave nodes
16:29:27 <akasatkin> Latest ISO is http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_master_iso/65/
16:29:34 <akasatkin> Option "Assign public network to all nodes" is on the Settings tab in UI and it is disabled by default. I.e. only "controller" and "zabbiz-server" nodes will have Public network by default.
16:29:42 <akasatkin> Both HA and non-HA deployments are under testing.
16:29:46 <mihgen> there were a lot of complaints about having public IPs assigned to all nodes, even when not needed
16:29:53 <akasatkin> Current problems are: nodes' default gateways point to the master node's IP (Sergey V. to figure out what to do with that), issues with ubuntu/neutron-vlan/ceph deployments (https://bugs.launchpad.net/fuel/+bug/1359834 - not yet reproduced on the last ISO)
16:29:54 <uvirtbot> Launchpad bug 1359834 in fuel "[custom iso] Ceph volumes don't work on ubuntu/neutron-vlan" [High,Incomplete]
16:30:14 <mihgen> xenolog: did you figure out what to do with default gateway?
16:30:36 <mihgen> akasatkin: cinder is fixed? it was using public net (wrongly)
16:30:48 <akasatkin> yes, fixed
16:31:01 <mihgen> ok good
16:31:25 <angdraug> is this feature a risk for 5.1 schedule?
16:31:28 <akasatkin> ceph also works on last ISO. we just started with this configuration on 65
16:31:52 <mihgen> angdraug: should be actually done
16:32:09 <akasatkin> but didn't check other cfg's on last ISO yet
16:32:12 <angdraug> for Ceph we still need to merge https://review.openstack.org/115728
16:32:31 <angdraug> otherwise it will raise HEALTH_WARN on every deployment
16:32:57 <xenolog> I can propose two way:
16:32:57 <xenolog> 1. simple  — set default gateway to the master node or to another node, that customer can define
16:32:57 <xenolog> 2. more_powerful — make virtual router, based on controller's and managed by pacemaker.
16:33:59 <xenolog> 1-st way required lot of changes in nailgun/fuelweb
16:34:30 <mihgen> why a lot of changes? should not be so
16:34:33 <mihgen> in 1st
16:34:39 <xenolog> 2-nd way —required  only puppet-part changes
16:35:20 <mihgen> 1st way should be a way simpler. but we need public IPs gateway for nova-network still
16:35:30 <mihgen> ok let's take it out of the meeting
16:35:39 <mihgen> and discuss in #fuel-dev or ML if needed
16:35:54 <mihgen> I hope it can be quickly done, otherwise we will have to go back with all of this
16:36:03 <xarses> xenolog: working with multiple-cluster networks, i found that l23network will set what ever its passed, we just need to fix nailgun to pass the network you want for default
16:36:09 <mihgen> ok I think we should move on
16:36:17 <xarses> we also need to get a gateway router from the user on the other network
16:36:22 <xenolog> mihgen:
16:36:22 <xenolog> 1. field for IP address definition in fuel-web
16:36:22 <xenolog> 2. network checker, that monitor, that defined router — is a router
16:36:22 <xenolog> 3. new variables in astute.yaml
16:37:01 <mihgen> xenolog: ohh
16:37:03 <mihgen> ok
16:37:13 <mihgen> let's move on
16:37:23 <mihgen> #topic mos-openstack bugs
16:37:32 <mihgen> dmitryme: around to provide status?
16:37:46 <xenolog> 2-nd way looks like more relaible
16:37:51 <dmitryme> mihgen: yep, I am here
16:38:10 <mihgen> xenolog: yeah but seems complicated
16:38:20 <xenolog> because don't required from curtomer infrastructure
16:38:24 <dmitryme> basically we have two critical bugs left: fixing keystone and memcache interaction
16:38:46 <dmitryme> and fixing oslo.messaging <-> rabbitmq interaction
16:38:59 <mihgen> any progress/estimates on both?
16:39:03 <dmitryme> for the first bug it is in the process of debugging
16:39:15 <dmitryme> I hope 1-2 days
16:39:20 <xenolog> mihgen: more complicated in implementing, but simpleest in using.
16:39:32 <dilyin> I guest the first way is better. Just set route to the IP provided by user. If master node goes down it's not a big problem. Internet connection from computes is not strictly required
16:39:46 <dmitryme> as for the second one, I’ve already merged the old fix we used in 5.0.1 and right now are testing more proper fix
16:39:46 <mihgen> dilyin: +1
16:39:59 <mihgen> xenolog: dilyin but looks like we need to talk about it over email :)
16:40:11 <dmitryme> s/I’ve/we’ve/
16:40:14 <mihgen> dmitryme: ok thanks
16:40:18 <dmitryme> it wasn’t me :-)
16:40:20 <mihgen> dmitryme: I hope it's gonna be soon too..
16:40:36 <mihgen> anything else?
16:40:49 <dmitryme> mihgen: I will discuss it, probably the old fix is good enough for 5.1
16:40:56 <xenolog> dilyin:
16:40:56 <xenolog> > If master node goes down it's not a big problem
16:40:56 <xenolog> master node CAN go down — it's i bigest problem!!!
16:41:02 <dmitryme> mihgen: nope, nothing else
16:41:22 <xenolog> because we have min.3 controller and only one master node.
16:41:33 <dilyin> xenolog, why? what importnat services are left there?
16:41:44 <mihgen> xenolog: we have no deps on master node, only NTP and partly DNS
16:41:46 <angdraug> topic is mos-openstack?
16:41:54 <mihgen> and that's what we must fix, hopefully in 6.0
16:41:58 <mihgen> should be easy
16:42:02 <mihgen> dmitryme: thanks
16:42:08 <mihgen> moving on
16:42:14 <dmitryme> o, I think I should have mentioned issue which should be fixed by sqlalchemy upgrade, it has ‘high’ status, the fix is under review right now
16:42:26 <angdraug> link?
16:42:35 <mihgen> dmitryme: ok
16:42:36 <xenolog> if something can иу икщлут - it is when something breaks.
16:42:40 <mihgen> #topic mos-linux bugs
16:42:53 <xenolog> s/иу икщлут/be broken/g
16:43:00 <nurla> sed
16:43:31 <mihgen> msemenov: around?
16:43:51 <mihgen> any status on rabbitmq upgrade, iptables.. ?
16:44:09 <msemenov> mihgen: here
16:44:10 <mihgen> #link https://bugs.launchpad.net/fuel/+bug/1359096
16:44:12 <uvirtbot> Launchpad bug 1359096 in fuel "Build iptables 1.4.11 for centos" [High,Confirmed]
16:44:27 <angdraug> we also still need yet another ceph update:
16:44:32 <angdraug> #link https://bugs.launchpad.net/fuel/+bug/1341009
16:44:35 <uvirtbot> Launchpad bug 1341009 in fuel/5.0.x "[osci] obsolete ceph package in fuel-repository for 5.0.1 and 5.1" [Critical,Fix released]
16:44:41 <xenolog> DNS  — this is not enough.
16:44:42 <msemenov> request https://gerrit.mirantis.com/#/c/21088
16:44:53 <msemenov> we have +1 from Evgeny Li
16:45:00 <msemenov> so it seems we can merge
16:45:56 <mihgen> angdraug: is it that critical with ceph?
16:46:04 <xarses> was the python issue resolved?
16:46:21 <mihgen> I'm afraid a bit of any pkgs upgrades if possible
16:46:28 <mihgen> xarses: what python issue?
16:46:30 <angdraug> mihgen: it's not critical but it's high priority
16:46:37 <msemenov> may be this one? https://bugs.launchpad.net/mos/+bug/1342068
16:46:38 <uvirtbot> Launchpad bug 1342068 in mos "syslog logging to /dev/log race condition" [Critical,Fix committed]
16:46:39 <mihgen> angdraug: even high..
16:46:42 <angdraug> there's upstart and radosgw related fixes in that version
16:46:50 <xarses> msemenov: correct
16:47:07 <msemenov> xarses: not reproduced with the fix. So moved to fix committed
16:47:22 <angdraug> if we're close to HCF I guess we can stay with ceph 0.80.4, but upstream highly recommends an upgrade
16:47:34 <xarses> msemenov: ok, it should be in today's ISO?
16:47:52 <msemenov> xarses: sure
16:47:53 <angdraug> nurla: do we have confirmation that python /dev/log problem is solved now?
16:48:14 <xarses> msemenov: ok, I will retest my case that caused it to occur all the time
16:48:14 <angdraug> xarses: yes, patched python packages have hit the 5.1 mirrors yesterday
16:48:32 <nurla> angdraug: no, we havn't
16:48:54 <mihgen> angdraug: I would skip it if possible, too many things, and we have 3 QA going on vacation next week
16:49:01 <nurla> because nova and neutron issues blocked us
16:49:08 <angdraug> mihgen: ok :(
16:49:26 <msemenov> as from the conversation with D.Borodaenko, there should be 100% cpu load for services writing to .dev.log
16:49:51 <msemenov> after restarting rsyslog many times(even during deployment)
16:49:51 <xarses> if you restart syslog while they are writing
16:50:00 <msemenov> and we dont see it
16:50:37 <angdraug> we don't have an ISO from today that would have passed centos bvt, why?
16:50:38 <xarses> We will retest today and mark it fix released if i can't see it anymore
16:51:47 <msemenov> xarses: and if the bug is still here, please provide deatiled repro steps in the issue
16:51:58 <xarses> yes
16:52:25 <mihgen> nurla: any comment on centos bvt?
16:52:46 <mihgen> nurla: we need bug about it. every build hangs
16:52:55 <nurla> ok
16:53:16 <mihgen> ok anything else to bring with mos-linux?
16:53:19 <nurla> at first look, issue with galera
16:53:50 <msemenov> link?
16:54:07 <mihgen> #topic 6.0 plans and beyond
16:54:30 <mihgen> so we've discussed it a bit previously, just repeating that the main goal is to get Juno working
16:55:00 <mihgen> the thing that we still don't test openstack master with current puppet manifests puts us under risk releasing before design summit
16:55:14 <mihgen> so we should start doing that ASAP and collaborate across teasm
16:55:29 <mihgen> we can run a few things in parallel
16:55:47 <mihgen> for now though anyone who can help with reaching HCF should do that.
16:56:10 <mihgen> that's it from my side. any questions/suggestions?
16:56:56 <mihgen> actually forgot about 5.0.2
16:57:04 <mihgen> #topic 5.0.2 milestone
16:57:07 <mihgen> #link https://launchpad.net/fuel/+milestone/5.0.2
16:57:27 <mihgen> there are a number of bugs over there. We must keep watching them too
16:57:49 <mihgen> should be like 95% of back ports to stable branch
16:58:13 <mihgen> #topic other questions
16:59:41 <christopheraedo> Last week I missed a chance to answer question about an open blueprint status.
16:59:44 <mihgen> looks like no questions except some we ran in parallel in the office)
16:59:45 <christopheraedo> Added in progress and todo work items to the blueprint (https://blueprints.launchpad.net/fuel/+spec/fuel-web-docs-dev-env-restructure)
16:59:48 <christopheraedo> Will change Fuel pages on OpenStack wiki today/tomorrow. First just re-ordering and improving the organization. Then I'll go through and add/update content over the next few days.
16:59:51 <mihgen> christopheraedo: oh yeah
17:00:12 <mihgen> christopheraedo: very good
17:00:13 <angdraug> time
17:00:13 <mihgen> thanks
17:00:20 <mihgen> ok guys thanks
17:00:26 <mihgen> see you next meeting
17:00:32 <tatyana> bb
17:00:36 <mihgen> #endmeeting