#openstack-freezer log

16:01:09 <m3m0> #startmeeting openstack-freezer 14-01-2016
16:01:10 <openstack> Meeting started Thu Jan 14 16:01:09 2016 UTC and is due to finish in 60 minutes.  The chair is m3m0. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:13 <openstack> The meeting name has been set to 'openstack_freezer_14_01_2016'
16:01:22 <m3m0> All: meetings notes available in real time at: https://etherpad.openstack.org/p/freezer_meetings
16:01:29 <m3m0> hey guys ready to rumble?
16:01:36 <ddieterly> yes
16:01:39 <m3m0> who is here today? please raise your hand
16:01:41 <m3m0> o/
16:01:45 <ddieterly> o/
16:01:54 <reldan> o/
16:02:52 <m3m0> ok let's start
16:03:05 <m3m0> #topic elasticsearch
16:03:23 <m3m0> we need to create a new mode in freezer to backup and restore elasticsearch
16:03:35 <m3m0> has anyone look at it?
16:03:50 <ddieterly> i looked at es this morning
16:04:11 <ddieterly> so, the req is to be able to backup /var/log, audit logs (whatever that means), and es
16:04:43 <m3m0> in case of cluster, do we need to backup only the master one?
16:04:51 <ddieterly> i think /var/log and audit logs can already be backed up in freezer thru config
16:05:29 <ddieterly> for es, we will need to mount a shared volume and snapshot es to that shared volume and then back the snapshot up
16:05:32 <m3m0> if that the case then no new mode is required
16:05:54 <m3m0> why a shared volume?
16:06:18 <ddieterly> the alternative is to backup each snapshot on each node of the cluster (i think)
16:07:13 <m3m0> is it necesary to backup each node?
16:08:07 <m3m0> by the way do you want to take ownership of this ddieterly?
16:08:13 <ddieterly> i think we would technically need to backup each shard
16:08:54 <ddieterly> to get a logically consistent view of the entire db, i seems easiest to snapshot to a shared repo on a single volume and that up
16:08:56 <ddieterly> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
16:10:41 <ddieterly> "shared file system repository" seems to be the most straight forward way to do it
16:10:45 <m3m0> could be a great idea to create a repository plugin for openstack
16:10:50 <ddieterly> but, i'm not an expert
16:11:01 <reldan> I think that probbly it may be better to just add plugin for swift
16:11:02 <reldan> yes
16:11:26 <reldan> Something like that https://github.com/elastic/elasticsearch-cloud-aws#s3-repository only for swift
16:12:04 <m3m0> All: meetings notes available in real time at: https://etherpad.openstack.org/p/freezer_meetings
16:12:11 <ddieterly> so, a plugin for es that stores to swift?
16:12:21 <reldan> https://github.com/wikimedia/search-repository-swift
16:12:22 <reldan> Yes
16:12:41 <ddieterly> that's probably what tsv was talking about in the email thread
16:12:54 <reldan> It seems that wikimedia already has a swift plugin
16:13:04 <m3m0> but reldan, does that break the swift, ssh, local storage functionality?
16:13:23 <reldan> In that case we just don’t need freezer to do a backup
16:13:53 <reldan> es will store all data in swift by itself
16:14:07 <m3m0> but in the case we want ssh?
16:14:14 <ddieterly> we would need to schedule and initiate the backup, right?
16:14:26 <m3m0> should we use 2 approach for this?
16:15:11 <reldan> yes, sure we can integrate it with scheduler
16:15:12 <reldan> PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
16:15:17 <reldan> to execute something like that
16:16:23 <reldan> Otherwise we will use 1) ElasticSearch backup to save data on disk 2) Use Freezer to store backup on Swift
16:16:27 <m3m0> ok, so first step is to create a bp and/or spec to review this
16:17:14 <m3m0> ddieterly what do you think?
16:17:48 <ddieterly> so, is the first step to investigate the options: 1) plugin or 2) just use freezer?
16:17:59 <m3m0> yes and create a spec
16:18:04 <reldan> Agree
16:18:12 <ddieterly> ok
16:18:46 <ddieterly> so, the first step is investigation?
16:18:57 <m3m0> yes
16:19:41 <ddieterly> ok
16:20:14 <ddieterly> i'm assuming that pierre can do the config in hlm to backup /var/log and the audit logs?
16:20:46 <m3m0> we need to create the configuration file and Slashme can deploy it
16:21:13 <m3m0> and of course we need to test it in a similar environment
16:22:09 <ddieterly> do we need to address the other questions that are in the blueprint?
16:22:21 <ddieterly> https://blueprints.launchpad.net/freezer/+spec/backup-centralized-logging-data
16:22:33 <m3m0> yes, please feel free
16:23:37 <ddieterly> what i mean, do any of the topics need to be addressed at this time
16:24:25 <ddieterly> so, i'm assuming that you all are very busy, and the last thing you need is more work
16:24:42 <ddieterly> so, it looks like i'll be investigating the plugin?
16:25:39 <m3m0> aaaa yes we have 4 more topics
16:25:40 <reldan> Yes and probably they have special requirements about incremental backups, encryption
16:25:48 <daemontool> I'm here sorry
16:25:50 <m3m0> so regarding elasticsearch are we clear in the next step
16:25:51 <m3m0> ?
16:25:59 <reldan> I don’t know - can we add encryption to plugin
16:26:05 <reldan> yes
16:26:15 <ddieterly> investigate plugin is the next step?
16:26:44 <daemontool> *I think*
16:27:00 <daemontool> and I might be wrong
16:27:09 <daemontool> the snapshotting feature from es
16:27:14 <daemontool> is similar to what we do with lvm
16:27:29 <daemontool> but the es built in snapshotting
16:27:47 <daemontool> offer a solution to execute backups of specific index/documents
16:28:09 <daemontool> so ddieterly  if you need a quick solution, I see the following options
16:28:36 <daemontool> 1) execute a fs backup + lvm snapshot on each elastic search node
16:29:14 <daemontool> 2) create a job to execute a script (i.e. with curl) that will create a snapshot using the elasticsearch buitin snapshot
16:29:46 <daemontool> and then there's another job that will backup that files in the file system, we need to understand where are stored that files in the filesystem by es when the snapshot is triggered
16:30:08 <ddieterly> i think 1 is not an option because of db consistency concerns
16:30:26 <m3m0> we can use sessions for that
16:30:27 <vannif> you pass the location to es as part of the curl invocation I think
16:30:42 <daemontool> ddieterly, with mongodb I did that in production in the public cloud in hp mahy time
16:30:49 <daemontool> and every time the backup was consistent
16:30:53 <vannif> I agree about the consistency issues with 1)
16:30:55 <daemontool> but the data wasn't sharded
16:31:01 <daemontool> so
16:31:10 <daemontool> there are two possible consistency issues there
16:31:20 <vannif> s/issue/concern
16:31:34 <daemontool> 1) half index in memory - half data written to the disk, generating data corruption
16:31:48 <daemontool> 2) inconsistencies with data sharded across multiple nodes
16:31:56 <daemontool> do you agree with that?
16:32:01 <ddieterly> yes
16:32:04 <daemontool> ok
16:32:16 <daemontool> for 1) I thin elasticsearch like mongo
16:32:26 <daemontool> writed the journal log file in the same directory as where the data is stored
16:32:46 <daemontool> so if a snapshot with lvm is created (ro snap, immutable)
16:32:51 <daemontool> the data doesn't change
16:32:55 <daemontool> the backup is executed
16:33:05 <daemontool> and the data is crash consistent
16:33:17 <daemontool> which would like, the porwer suddenly goes way on that node
16:33:29 <daemontool> anyone see any issue here?
16:33:44 <daemontool> so we need to understand if elastic search store journal logs
16:33:52 <daemontool> I think so
16:33:58 <daemontool> but I might be wrong
16:34:10 <vannif> and that might change
16:34:11 <daemontool> all good so far?
16:34:20 <ddieterly> i think so
16:34:36 <m3m0> yes, time is a concern and we have 3 more topics, should we continue with this or move forward?
16:34:36 <daemontool> vannif,  in mongodb the data is stored on the same directory
16:34:39 <daemontool> /var/lib/mongo
16:34:48 <daemontool> m3m0,  one sec
16:34:50 <daemontool> this is critical
16:34:55 <daemontool> sorry
16:35:12 <daemontool> because the #1 solution would be easy to implement for your needs
16:35:17 <daemontool> as no code needs to be written
16:35:24 <daemontool> for the issue #2
16:35:29 <daemontool> we have a feature called job session ddieterly
16:35:36 <ddieterly> yes, i like 1 then ;-)
16:35:46 <daemontool> I just explain, that you decide guys
16:35:48 <daemontool> :)
16:35:56 <daemontool> on #2
16:36:11 <daemontool> job session is used to execute backup at near the same time on multiple nodes
16:36:25 <daemontool> and that can be used to solve the shards inconsistencies
16:36:27 <daemontool> I think
16:36:30 <daemontool> before writing code
16:36:34 <daemontool> it's worth to test this
16:36:38 <daemontool> because it's fast
16:36:46 <ddieterly> i don't think that that would give any guarantees
16:36:47 <daemontool> and will help us to improve job session
16:36:59 <vannif> well. from what understand es has 2 ways of writing data: by default it writes data to all the shards before returning a positive ack to the user. that would result in all the shards having the data in their disks or journals
16:37:10 <m3m0> but I don't know why we want to backup all nodes, are not supposed to be replicas of the master node?
16:37:18 <daemontool> m3m0,  it depends,
16:37:26 <daemontool> elastic search to scale and recude I/O
16:37:33 <daemontool> s/recude/reduce/
16:37:40 <ddieterly> we need to back up all shards
16:37:42 <daemontool> split the data on multiple nodes, called shards
16:37:43 <vannif> another way is less secure: write data to the master and return a positive ack to the user. *then* replicate
16:37:51 <daemontool> ddieterly,  ++
16:38:00 <daemontool> I think with job session
16:38:12 <daemontool> the solution can be acceptable
16:38:18 <daemontool> because we have the same issue anyway
16:38:27 <daemontool> even if we use the snapshotting
16:38:30 <daemontool> built in feature in es
16:38:33 <daemontool> that needs to be execute
16:38:39 <daemontool> at near same time
16:38:42 <daemontool> across all the nodes
16:38:56 <ddieterly> i'm not liking that; no guarantees
16:39:03 <daemontool> vannif,  please off line if you can explain better the job session to ddieterly
16:39:04 <daemontool> ?
16:39:05 <ddieterly> depends on timing
16:39:10 <daemontool> ddieterly,  yes I agree
16:39:22 <daemontool> in helion all the nodes are synced with an ntp node
16:39:25 <vannif> sure
16:39:31 <daemontool> but yes, you are right
16:39:39 <daemontool> no doubt about that, it is best effort
16:39:52 <daemontool> ddieterly, are you comfortable to test that?
16:39:58 <daemontool> or do you want to go with other solutions?
16:40:00 <ddieterly> so, #1 seems reasonable if it guarantees consistency
16:40:24 <daemontool> I think if the writes of es are atomic
16:40:30 <daemontool> the consistency should be OK
16:40:33 <daemontool> but
16:40:41 <daemontool> 100% consistency cannot be guaranteed
16:40:45 <daemontool> :(
16:41:05 <daemontool> it's a computer science challenge to execute two actions exactly on the same time on multipel nodes
16:41:12 <daemontool> not only our problem
16:41:21 <ddieterly> the only way that 100% consistency can be guaranteed seems to be to use the snapshot feature of es
16:41:34 <daemontool> ok
16:41:40 <daemontool> then my advice yould be
16:41:43 <daemontool> would be
16:41:46 <daemontool> to write a script
16:41:51 <daemontool> that execute the snapshot with curl
16:42:09 <daemontool> and then execute the backup of data as fs backup with the agent
16:42:18 <daemontool> that wouldn't require writing code
16:42:20 <vannif> I think #1 is reasonable, even though it relies on some assumptions. It does not require any new backup-mode anyway. we can leave an elasticsearch-mode for direct interaction with es (i.e. request a snapshot)
16:42:22 <ddieterly> if we can snapshot to each node, then we can just back that up with freezer
16:42:41 <daemontool> ddieterly,  yes
16:42:43 <daemontool> that was #2
16:42:57 <daemontool> now, we can dedice this even tomorrow
16:43:01 <ddieterly> so, we need to investigate whether es can do that
16:43:04 <daemontool> yes
16:43:07 <ddieterly> if so, that seems the best plan
16:43:10 <daemontool> ddieterly,  ok
16:43:18 <daemontool> are you comfortable? can we move forward?
16:43:21 <ddieterly> if not, then see if we can do #1
16:43:30 <daemontool> please vannif  if you can explain job session also to ddieterly  offline?
16:43:46 <ddieterly> i'll setup a meeting
16:43:52 <daemontool> so we can move on the other topoic
16:43:56 <daemontool> we can do a hangout meeting
16:43:59 <daemontool> so I can participate
16:44:01 <daemontool> as you want
16:44:04 <daemontool> or an irc meeting
16:44:10 <ddieterly> google hangout?
16:44:13 <daemontool> yes
16:44:19 <ddieterly> sure, i'll set that up
16:44:20 <daemontool> hangout I thin kis better
16:44:21 <daemontool> ok
16:44:22 <daemontool> ty
16:44:26 <ddieterly> np
16:44:31 <daemontool> m3m0, let's run fast :)
16:44:54 <m3m0> #topic cinder and nova backups
16:45:03 <m3m0> what's the status on this?
16:45:12 <daemontool> Mr reldan
16:45:13 <daemontool> :)
16:46:18 <reldan> We have nova ephemeral disk backup (not incremental), cindernative backup (cannot be done on attached images (should be from liberty)), cinder backup (non-incremental)
16:46:46 <m3m0> is this working now?
16:46:47 <reldan> Currently we cannot make a backup of whole vm with attached volumes
16:47:19 <daemontool> reldan,  that what I think we need
16:47:31 <daemontool> because currently no one is providing a solution for taht
16:47:40 <daemontool> like nova + attached volumes
16:47:48 <daemontool> s/nova/nova vm/
16:47:56 <reldan> Yes for nova with epehemeral, No - for nova with bootable cinder volume (can be done throug cinder-backup)
16:48:02 <m3m0> can we inspect the vm and check if it has attached volumes and then execute nova and/or cinder backups?
16:48:20 <daemontool> m3m0,  yes from the API
16:48:25 <daemontool> from the Nova API
16:48:37 <daemontool> frescof, please provide your inputs if any ^^
16:48:47 <reldan> And probably we have problem with auth_url v3
16:49:06 <m3m0> why?
16:50:01 <reldan> I don’t know. But I saw that it cannot authorize (trying to use wrong http address or something like that)
16:50:16 <daemontool> mmhhh
16:50:32 <reldan> m3m0: We can expect attached volumes - yes
16:50:32 <daemontool> we should be able to do that
16:50:54 <reldan> But there are still a problem with consistency
16:51:01 <daemontool> reldan,  at least the orchestration of backing up vms + attached volumes
16:51:10 <reldan> Any backup/snapshot on attached volume can be corrupted
16:51:11 <daemontool> I think it should be  provided
16:51:26 <daemontool> why?
16:51:33 <daemontool> is crash consistent anyway
16:51:58 <reldan> because we use —force to do so
16:52:00 <daemontool> it's like backing up /var/lib/mysql with lvm without flushing the in memory data of mysql
16:53:11 <daemontool> there's no other way to do that from outside the vm
16:53:15 <daemontool> I think >(
16:53:41 <reldan> I suppose the same.
16:53:54 <m3m0> unless we define a new mode? in freezer that inspect the architecture of the vm and execute internal and external backups
16:54:02 <m3m0> accordingly
16:54:06 <daemontool> I think
16:54:10 <daemontool> that make sense
16:54:13 <daemontool> but it's up to the user
16:54:22 <daemontool> if he want's to use
16:54:23 <reldan> But if we want to have a backup that contains (let’s say) 3 cinder volumes, 1 nova instance with information about where we should mount each volume - we should define such format
16:54:56 <m3m0> but wait, each volume is a backup right?
16:55:29 <daemontool> m3m0,  yes
16:55:54 <reldan> If I understand it correct, the goal - is implementing full backup of instance with all attached volumes. In this case we should implement ephemeral disk backup, backup of each volume and metainformation - how to restore it
16:56:00 <reldan> how to reassemble instance
16:56:15 <reldan> It’s like metabackup of backups
16:56:29 <m3m0> the instance should be up and running again, is not freezer responsability to do that
16:56:40 <m3m0> the jobs for restore should only contain paths
16:57:01 <reldan> So if you terminate your instance- you cannot restore it?
16:57:13 <m3m0> nop
16:57:31 <daemontool> mmhhh
16:57:33 <m3m0> you need somewhere to restore it
16:57:44 <daemontool> I think probably we need to keep it a bit simple, or we go through a dark sea
16:58:06 <m3m0> we can have this discussion offline
16:58:33 <reldan> Let’s just say we have two openstack installation. If I understand the task correct - we should be able to create a backup in installation1 and restore the same configuration in instalation2
16:58:53 <daemontool> yes
16:59:01 <daemontool> so we can offer disaster recovery capabilities
16:59:27 <daemontool> let's do this
16:59:32 <m3m0> I dissagre
16:59:33 <reldan> In this case it would be great to create and discuss blue print
16:59:33 <daemontool> I'll write a bp fo rthis stuff
16:59:38 <daemontool> the
16:59:43 <daemontool> and we can then discuss on that
16:59:46 <daemontool> change it and so on
16:59:49 <daemontool> m3m0,  is that ok?
16:59:52 <m3m0> yes, of course
16:59:55 <reldan> yes
16:59:55 <daemontool> ok
16:59:57 <m3m0> we are running late
16:59:59 <daemontool> let's move forward
17:00:00 <daemontool> yep
17:00:06 <m3m0> and we have 2 more topics
17:00:11 <m3m0> should we do it next week?
17:00:20 <m3m0> python freezer client and list of backups
17:00:35 <daemontool> let's to id 5 minutes now
17:00:38 <daemontool> s/id/it/
17:00:45 <daemontool> python freezerclient
17:00:47 <daemontool> let's skipit
17:00:53 <daemontool> but list of backups
17:00:59 <daemontool> if fundamental that we have it in mitaka
17:01:03 <daemontool> vannif, ^^
17:01:11 <daemontool> is essential...
17:01:23 <daemontool> we need to be able to list backups and restore using the scheduler
17:01:26 <m3m0> yes, and it's not complicated, the ui has that funcionality already
17:01:27 <daemontool> retrieving data at least form the api
17:01:32 <daemontool> m3m0,  yep
17:01:35 <m3m0> its a matter of replicate that
17:01:37 <daemontool> vannif, can you do that please?
17:01:53 <daemontool> or m3m0  if your workload on the web ui
17:02:00 <daemontool> it's not huge
17:02:04 <vannif> yes
17:02:09 <daemontool> vannif,  ok thank you
17:02:14 <daemontool> then we'll move that stuff
17:02:18 <daemontool> in the python-freezerclient
17:02:20 <daemontool> ok
17:02:28 <vannif> I
17:02:37 <daemontool> You
17:02:40 <daemontool> ol
17:02:42 <daemontool> lol
17:02:54 <vannif> I've started to look at how to use cliff for the freezerclient
17:03:03 <m3m0> I'm very busy but I can do that if vannif is busy as well
17:03:05 <daemontool> vannif,  yes but we cannot do that for now
17:03:11 <daemontool> vannif,  can do that
17:03:23 <daemontool> sorry
17:03:27 <daemontool> we cannot do that
17:03:28 <daemontool> for now
17:03:38 <vannif> you mean no cliff ?
17:03:41 <daemontool> we can do that after we split the code
17:03:42 <daemontool> yes
17:03:52 <vannif> oh. ok. it's quicker then :)
17:03:52 <m3m0> wait wait
17:04:03 <m3m0> list from scheduler and the split?
17:04:19 <daemontool> list from scheduler can be doen now
17:04:28 <daemontool> python-freezerclient split code can be done now
17:04:40 <daemontool> python-freezerclient using cliff after the split
17:04:45 <m3m0> we can split vannif in 2
17:04:48 <daemontool> haha
17:04:51 <daemontool> even in 3
17:04:58 <daemontool> we can cut it in 3
17:05:10 <m3m0> the italian way of doing bussines :P
17:05:19 <daemontool> and doing sausages
17:05:19 <m3m0> ok guys what's the veredict?
17:05:22 <daemontool> ok
17:05:23 <daemontool> so
17:05:32 <daemontool> vannif, implement the job listing
17:05:37 <daemontool> I do the python-freezerclient split
17:05:46 <daemontool> after that
17:06:00 <vannif> ok
17:06:03 <m3m0> #agree
17:06:05 <daemontool> we can use cliff on the freezerclient
17:06:10 <daemontool> ++
17:06:11 <daemontool> ok
17:06:17 <daemontool> is that all?
17:06:41 <m3m0> yes
17:06:44 <m3m0> for now...
17:06:59 <daemontool> I'm going to write
17:07:03 <m3m0> ok guys thanks to all for your time
17:07:04 <daemontool> the bp for nova and cinder?
17:07:04 <daemontool> ok
17:07:08 <m3m0> perfect
17:07:16 <m3m0> do that daemontool
17:07:23 <daemontool> I'll do it m3m0
17:07:25 <daemontool> lol
17:07:27 <daemontool> :)
17:07:33 <daemontool> you pleae cut vannif  in 3
17:07:34 <m3m0> #endmeeting