Saturday, 2015-10-17

*** tpot has quit IRC00:02
*** cloudnautique has quit IRC00:03
SamYaplekfox1111: you still around?00:16
*** cemason has quit IRC00:26
*** vinkman has quit IRC00:30
kfox1111SamYaple: yeah.00:33
SamYaplehola00:34
SamYapleyou had so questions about recovery?00:34
kfox1111hi. :)00:34
SamYaplesome*00:34
kfox1111yeah. specifically, what the procedure is for galera to recover it from power failure.00:34
SamYaplethat is a bit tricky because thats dependant on galera00:35
kfox1111the docs for it mention doing some games finding the last written server and bringing that one up first with special args, then adding the rest.00:35
kfox1111but I'm not sure how that will work with the containers.00:35
SamYapleso the official way if you are unaware is to find /var/lib/mysql/grastate.dat with the highest revision00:35
*** dimsum__ has quit IRC00:35
SamYaplebut when it crashes that is sometimes -100:35
SamYaplebut basically you have to pick a node to start the cluster again with00:36
SamYapleideally the llast node to shutdown00:36
SamYaplethis cant be done automatically, so it will be in the deployers responsibilities to do this00:36
kfox1111I'm guessing with powerfailure, they will be basically the same.00:36
SamYaplemaybe maybe not, what if one was down ahead of time anyway00:37
kfox1111but how do you start it back up with the containers? do you tweak a config file and docker start it back up, or do you use ansible?00:37
kfox1111ah. true.00:37
SamYaplesince we use xtrabackup for the configs, i do my backups with xtrabackup00:38
SamYapleit talks to mariadb over tcp so containers dont really play a part here00:38
kfox1111I don't follow.00:39
kfox1111your backing up/restoring the data to a fresh cluster for recovery of power failures?00:40
SamYapleoh im sorry00:40
SamYaplei thought you askeed how i backup the data00:40
SamYapleyou asked how to "start it back up"00:40
kfox1111right.00:40
kfox1111I see logic in the bootstrap code to do the startup dance, but not a way to do it otherwise.00:41
SamYapleno to start the cluster fresh again the easiest thing to do is make the gcomm:// list empty and start a contaier00:41
SamYapleyou can do this with the override file for a single node00:41
SamYapleits a bit of a dance, sure. but it only involves running the playbooks once if you do it right00:42
kfox1111so you do that on one of the hosts, and then docker start mariadb on that host first?00:42
SamYaplekfox1111: yes, but that would require COPY_ALWAYS as your config method00:42
SamYaplewithout that you have to run the playbooks twice00:42
SamYaplethe first time limited to a single host, the second time without the gcomm override on all hosts00:42
SamYapleas much as I want to make this automated, this is realyl a manual process for reasons discussed before00:43
kfox1111if its not copy always, then how does the data stay safe?00:43
kfox1111is the mariadb data in a seperate volume?00:44
kfox1111I'm ok with it being manual. I just need it documented. :)00:44
SamYaplethere is a "data container" as docker has taken to calling it for the mysql data00:44
kfox1111ah. so you can rebuild the container leaving the data safe. ok.00:44
SamYaplekfox1111: ill tell you what. if you file a bug stating lack of documentation in this area and assign it to me ill get this for your in writing by next week00:46
SamYaplei dont see the proceedure to much different than a normal recovery00:46
kfox1111Awesome. Thanks. :)00:46
SamYaplebut i get how it is a bit more complex00:46
kfox1111Yeah. but the potential for getting it wrong is "poof" :)00:47
SamYaplewell thats what backups are for00:47
SamYapleworst case you _can_ start a new cluster and restore backup00:47
SamYapleits really hard to get it wrong in my opinion.... if you have a proper backup00:47
kfox1111true. but I've never seen just how bad it is to regress your database a day or two when vm's are coming and going in the cloud. no idea what happens. :/00:48
kfox1111And I really really don't want to find out. :)00:48
SamYapleits not so bad. if the database is down you aren't creating new vms anyway00:48
kfox1111its the vm's that are created between the backup and outage that are lost track of?00:49
SamYapleand ideally if your database is down you keep access to your environment limited to readonly more or less until you know the maintenance is successful00:49
SamYaplei mean if the db is down you arent creating vms00:49
kfox1111tenants created, quotas changed, a lot of potential for skews that I don't think have any recovery mechanisms.00:49
SamYapleyou are tlaking about when you bring the db back up and then vms get created and then the db crashes00:49
SamYapleif the db is down there will not be any changes in your environemnt00:50
kfox1111no, I'm saying, if you take a backup at midnight, then it crashes at noon, you have half a day of your users doing things on the cloud that are lost if you have to rebuild the db from backups.00:50
*** alisonh has quit IRC00:50
SamYapleah yes00:50
SamYapleits not so bad actually. data is intact00:50
SamYaplenew changes are obviously lost00:51
SamYaplevms that have migrated are also a pain00:51
SamYaplebut with shared backend its pretty quick to fix that stuff00:51
SamYapleim not advocatign for it mind you :)00:51
kfox1111sure. :)00:51
SamYaplebut you can always dump the raw /varlib/mysql/ folder before trying to bring the cluster back up00:52
kfox1111I've had to fix up nova quota's miscalculations before. just something I hope to avaoid whenever possible. :)00:52
kfox1111ah. thats true too.00:52
kfox1111a good note for the procedure too. :)00:52
kfox1111can you cow a docker volume?00:52
SamYaplenot that im aware, but if yuor backend is a cow filesystem (btrfs, zfs) you can cow the subvolume00:53
larskskfox1111: Not directly, but if you were using host volumes you could do that manually via overlayfs...00:53
SamYaplelarsks: how would that look with overlayfs?00:53
kfox1111bummer. cause that would make it a realy easy procedure. snapshot data volume, then procede to restart cluster.00:54
larsksFor each container, you could make a new overlayfs mount with the same base directory.00:54
larsksAnd then mount the merged volume (so no container would mount the base)00:54
SamYapleeh. yea thats not so easy. seems like a greater risk to data of trying to do that00:55
kfox1111is rabbit cluster's totally stateles? power off recovery is just start it all back up?00:56
SamYaplekfox1111: ugh rabbitmq00:56
larsksWell...I don't know that it's at all risky, but it is certainly manual.00:56
SamYapleit should be but rabbitmq is horrible with clustering00:56
kfox1111yeah. probably safer to just tar dump the data volume.00:56
SamYaplei actually have a new clustering patch in the queue for rabbitmq which _would_ make it that way00:56
SamYaplehonestly with rabbitmq i would just commit to losign those messages if it doesnt start backup perfectly00:57
kfox1111Yeah, when completely powered off, I'd kind of expect it. I don't think openstack services really care?00:58
kfox1111should I adjust the bug report to just ask for power off /restart instructions for the cloud rather then just galera?00:58
SamYapleyoull lose recent messages00:58
kfox1111but the services will just make a new rpc call though?00:58
SamYapleso potetially some accepted apis that break (like volume deletions?)00:58
*** achanda has quit IRC00:59
kfox1111hmm.. could be.00:59
SamYaplewell if all of your control nodes failed we are talking full datacenter poerfailure no?00:59
kfox1111yeah. most likely.00:59
kfox1111we had one just the other day. :/00:59
kfox1111they happen rarely, but they do happen. :/00:59
SamYaplepresummably you put your control nodes on different power lines so _all_ nodes would be down00:59
SamYapleso everything is starting fresh01:00
kfox1111right.01:01
kfox1111thats the case I'm worried about.01:01
SamYapleyea i think youll be more ok than you realize01:02
*** achanda has joined #kolla01:02
SamYapleive done a few full stop recoveries with kolla01:02
SamYaplenot so bad01:02
SamYaplebut its not at the 50-100 node scale01:02
SamYaplestill the mechanices shouldnt change01:02
kfox1111Yeah. I totally believe it can be done. :)  just need it documented before my management will consider kolla production worthy enough to use. :/01:03
SamYaplei understand01:03
SamYaplejust throw that bug my way so i dont foget01:03
kfox1111yup. most of the way done with it. just a sec.01:04
*** unicell has quit IRC01:04
kfox1111Submitted:01:07
kfox1111https://bugs.launchpad.net/kolla/+bug/150706501:07
openstackLaunchpad bug 1507065 in kolla "documentation for power recovery" [Undecided,New]01:07
kfox1111having a problem assigning it to you though.01:07
*** alisonh has joined #kolla01:08
kfox1111Thanks for looking into it.01:08
*** daneyon_ has quit IRC01:15
*** achanda has quit IRC01:15
*** vinkman has joined #kolla01:39
*** vinkman has quit IRC01:40
*** vinkman has joined #kolla01:41
*** vinkman has quit IRC01:41
*** vinkman has joined #kolla01:42
*** vinkman has quit IRC01:44
*** dimsum__ has joined #kolla01:47
*** dimsum__ has quit IRC01:47
*** dimsum__ has joined #kolla01:48
*** tummy has joined #kolla02:04
openstackgerritKuo-tung Kao proposed openstack/kolla: add "registry" flag to "tools/build.py"  https://review.openstack.org/23462902:08
*** achanda has joined #kolla02:21
*** dimsum__ has quit IRC02:37
*** cemason has joined #kolla02:38
*** cemason has quit IRC02:43
*** tummy has quit IRC02:51
*** bmace has quit IRC02:53
*** bmace has joined #kolla03:08
*** bmace has quit IRC03:15
*** tummy has joined #kolla03:17
*** bmace has joined #kolla03:30
*** cloudnautique has joined #kolla03:32
*** tummy has quit IRC03:33
*** dimsum__ has joined #kolla03:38
*** dimsum__ has quit IRC03:43
*** bmace has quit IRC03:49
*** tummy has joined #kolla03:53
*** bmace has joined #kolla04:04
*** tummy has quit IRC04:06
*** bmace has quit IRC04:24
*** bmace has joined #kolla04:39
*** dimsum__ has joined #kolla04:41
*** dimsum__ has quit IRC04:46
*** dtturner has quit IRC04:50
*** unicell has joined #kolla05:32
*** CBR09 has joined #kolla05:34
*** exploreshaifali has joined #kolla05:51
*** asalkeld has quit IRC06:01
*** exploreshaifali has quit IRC06:36
*** exploreshaifali has joined #kolla06:41
*** achanda has quit IRC06:41
*** dimsum__ has joined #kolla06:44
*** achanda has joined #kolla06:45
*** dimsum__ has quit IRC06:49
*** achanda has quit IRC07:07
*** shardy_a1k has joined #kolla07:09
*** shardy_afk has quit IRC07:10
*** cloudnautique has quit IRC07:11
*** shardy_a1k has quit IRC07:14
*** shardy_afk has joined #kolla07:15
*** exploreshaifali has quit IRC07:16
*** exploreshaifali has joined #kolla07:29
*** jmccarthy has quit IRC07:30
*** jmccarthy has joined #kolla07:30
*** cemason has joined #kolla08:06
*** cemason has quit IRC08:10
openstackgerritMichal Rostecki proposed openstack/kolla: [WIP] Use trusts in heat.conf  https://review.openstack.org/23619808:27
openstackgerritMichal Rostecki proposed openstack/kolla: [WIP] Use trusts in heat.conf  https://review.openstack.org/23619808:29
*** pbourke has quit IRC08:39
*** pbourke has joined #kolla08:40
nihilifero/08:42
nihiliferis ansible-magnum blueprint free to assign?08:43
*** dimsum__ has joined #kolla08:46
*** dimsum__ has quit IRC08:52
*** exploreshaifali has quit IRC09:30
*** dimsum__ has joined #kolla09:49
*** dimsum__ has quit IRC09:53
*** cloudnautique has joined #kolla10:13
*** cloudnautique has quit IRC10:17
*** cemason has joined #kolla10:33
*** dwalsh has joined #kolla10:34
*** cemason has quit IRC10:37
SamYaplenihilifer: go for it10:52
openstackgerritMerged openstack/kolla: add "registry" flag to "tools/build.py"  https://review.openstack.org/23462910:55
*** achanda has joined #kolla11:07
*** achanda has quit IRC11:07
*** CBR09 has quit IRC11:23
openstackgerritSam Yaple proposed openstack/kolla: Remove vip for rabbitmq  https://review.openstack.org/23577711:39
*** exploreshaifali has joined #kolla11:49
*** dimsum__ has joined #kolla11:50
*** dimsum__ has quit IRC11:56
*** dwalsh has quit IRC12:18
*** diogogmt has quit IRC12:55
*** diogogmt has joined #kolla12:58
*** exploreshaifali has quit IRC13:01
*** dimsum__ has joined #kolla13:09
*** jainman has joined #kolla13:19
jainmanbug -1502633 -Need help for gerrit review13:20
*** jainman has quit IRC13:35
*** jainman has joined #kolla13:47
jainmanHi, Having issue in  commiting gerrit-review - Bug  https://bugs.launchpad.net/kolla/+bug/150263313:48
openstackLaunchpad bug 1502633 in kolla "connecting to rabbitmq returns ECONNREFUSED" [Critical,Invalid]13:48
*** jainman has quit IRC13:54
*** cloudnautique has joined #kolla14:35
*** sdake has joined #kolla14:54
*** cloudnautique has quit IRC14:55
*** sdake has quit IRC15:20
openstackgerritMichal Rostecki proposed openstack/kolla: [WIP] Add Ansible support for Magnum  https://review.openstack.org/23622316:16
*** daneyon has joined #kolla16:24
*** daneyon_ has joined #kolla16:25
*** daneyon has quit IRC16:29
*** achanda has joined #kolla17:06
-openstackstatus- NOTICE: Gerrit will be offline for project renames starting at 1800 UTC.17:06
*** ChanServ changes topic to "Gerrit will be offline for project renames starting at 1800 UTC."17:06
*** achanda has quit IRC17:26
*** cemason has joined #kolla17:57
-openstackstatus- NOTICE: Gerrit is offline for project renames.17:59
*** ChanServ changes topic to "Gerrit is offline for project renames."17:59
*** cloudnautique has joined #kolla18:02
*** cloudnautique has quit IRC18:04
*** cloudnautique has joined #kolla18:06
*** cemason has quit IRC18:12
*** ChanServ changes topic to "Kolla IRC meetings on Wednesday - Agenda @ https://wiki.openstack.org/wiki/Meetings/Kolla - IRC channel is *LOGGED* @ http://eavesdrop.openstack.org/irclogs/%23kolla/"18:35
-openstackstatus- NOTICE: Gerrit is back online. Github transfers are in progress and should be complete by 1900 UTC.18:35
*** achanda has joined #kolla18:37
*** openstackgerrit has quit IRC18:46
*** openstackgerrit has joined #kolla18:46
*** cemason has joined #kolla19:13
*** dimsum__ has quit IRC19:14
*** cemason has quit IRC19:18
*** openstackgerrit has quit IRC19:31
*** openstackgerrit has joined #kolla19:31
*** cemason has joined #kolla19:37
*** cloudnautique has quit IRC19:46
*** cloudnautique has joined #kolla19:47
*** achanda has quit IRC19:49
*** cloudnautique has quit IRC19:52
*** dimsum__ has joined #kolla20:15
*** dimsum__ has quit IRC20:20
*** vinkman has joined #kolla20:42
*** jtriley has joined #kolla20:55
*** achanda has joined #kolla21:10
*** cemason has quit IRC21:12
*** jtriley has quit IRC21:24
*** cemason has joined #kolla21:43
*** dimsum__ has joined #kolla22:18
*** dimsum__ has quit IRC22:23
*** dimsum__ has joined #kolla22:30
*** achanda has quit IRC22:32
*** cloudnautique has joined #kolla22:50
*** cloudnautique has quit IRC22:57

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!