16:01:09 #startmeeting openstack-freezer 14-01-2016 16:01:10 Meeting started Thu Jan 14 16:01:09 2016 UTC and is due to finish in 60 minutes. The chair is m3m0. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:13 The meeting name has been set to 'openstack_freezer_14_01_2016' 16:01:22 All: meetings notes available in real time at: https://etherpad.openstack.org/p/freezer_meetings 16:01:29 hey guys ready to rumble? 16:01:36 yes 16:01:39 who is here today? please raise your hand 16:01:41 o/ 16:01:45 o/ 16:01:54 o/ 16:02:52 ok let's start 16:03:05 #topic elasticsearch 16:03:23 we need to create a new mode in freezer to backup and restore elasticsearch 16:03:35 has anyone look at it? 16:03:50 i looked at es this morning 16:04:11 so, the req is to be able to backup /var/log, audit logs (whatever that means), and es 16:04:43 in case of cluster, do we need to backup only the master one? 16:04:51 i think /var/log and audit logs can already be backed up in freezer thru config 16:05:29 for es, we will need to mount a shared volume and snapshot es to that shared volume and then back the snapshot up 16:05:32 if that the case then no new mode is required 16:05:54 why a shared volume? 16:06:18 the alternative is to backup each snapshot on each node of the cluster (i think) 16:07:13 is it necesary to backup each node? 16:08:07 by the way do you want to take ownership of this ddieterly? 16:08:13 i think we would technically need to backup each shard 16:08:54 to get a logically consistent view of the entire db, i seems easiest to snapshot to a shared repo on a single volume and that up 16:08:56 https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html 16:10:41 "shared file system repository" seems to be the most straight forward way to do it 16:10:45 could be a great idea to create a repository plugin for openstack 16:10:50 but, i'm not an expert 16:11:01 I think that probbly it may be better to just add plugin for swift 16:11:02 yes 16:11:26 Something like that https://github.com/elastic/elasticsearch-cloud-aws#s3-repository only for swift 16:12:04 All: meetings notes available in real time at: https://etherpad.openstack.org/p/freezer_meetings 16:12:11 so, a plugin for es that stores to swift? 16:12:21 https://github.com/wikimedia/search-repository-swift 16:12:22 Yes 16:12:41 that's probably what tsv was talking about in the email thread 16:12:54 It seems that wikimedia already has a swift plugin 16:13:04 but reldan, does that break the swift, ssh, local storage functionality? 16:13:23 In that case we just don’t need freezer to do a backup 16:13:53 es will store all data in swift by itself 16:14:07 but in the case we want ssh? 16:14:14 we would need to schedule and initiate the backup, right? 16:14:26 should we use 2 approach for this? 16:15:11 yes, sure we can integrate it with scheduler 16:15:12 PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true 16:15:17 to execute something like that 16:16:23 Otherwise we will use 1) ElasticSearch backup to save data on disk 2) Use Freezer to store backup on Swift 16:16:27 ok, so first step is to create a bp and/or spec to review this 16:17:14 ddieterly what do you think? 16:17:48 so, is the first step to investigate the options: 1) plugin or 2) just use freezer? 16:17:59 yes and create a spec 16:18:04 Agree 16:18:12 ok 16:18:46 so, the first step is investigation? 16:18:57 yes 16:19:41 ok 16:20:14 i'm assuming that pierre can do the config in hlm to backup /var/log and the audit logs? 16:20:46 we need to create the configuration file and Slashme can deploy it 16:21:13 and of course we need to test it in a similar environment 16:22:09 do we need to address the other questions that are in the blueprint? 16:22:21 https://blueprints.launchpad.net/freezer/+spec/backup-centralized-logging-data 16:22:33 yes, please feel free 16:23:37 what i mean, do any of the topics need to be addressed at this time 16:24:25 so, i'm assuming that you all are very busy, and the last thing you need is more work 16:24:42 so, it looks like i'll be investigating the plugin? 16:25:39 aaaa yes we have 4 more topics 16:25:40 Yes and probably they have special requirements about incremental backups, encryption 16:25:48 I'm here sorry 16:25:50 so regarding elasticsearch are we clear in the next step 16:25:51 ? 16:25:59 I don’t know - can we add encryption to plugin 16:26:05 yes 16:26:15 investigate plugin is the next step? 16:26:44 *I think* 16:27:00 and I might be wrong 16:27:09 the snapshotting feature from es 16:27:14 is similar to what we do with lvm 16:27:29 but the es built in snapshotting 16:27:47 offer a solution to execute backups of specific index/documents 16:28:09 so ddieterly if you need a quick solution, I see the following options 16:28:36 1) execute a fs backup + lvm snapshot on each elastic search node 16:29:14 2) create a job to execute a script (i.e. with curl) that will create a snapshot using the elasticsearch buitin snapshot 16:29:46 and then there's another job that will backup that files in the file system, we need to understand where are stored that files in the filesystem by es when the snapshot is triggered 16:30:08 i think 1 is not an option because of db consistency concerns 16:30:26 we can use sessions for that 16:30:27 you pass the location to es as part of the curl invocation I think 16:30:42 ddieterly, with mongodb I did that in production in the public cloud in hp mahy time 16:30:49 and every time the backup was consistent 16:30:53 I agree about the consistency issues with 1) 16:30:55 but the data wasn't sharded 16:31:01 so 16:31:10 there are two possible consistency issues there 16:31:20 s/issue/concern 16:31:34 1) half index in memory - half data written to the disk, generating data corruption 16:31:48 2) inconsistencies with data sharded across multiple nodes 16:31:56 do you agree with that? 16:32:01 yes 16:32:04 ok 16:32:16 for 1) I thin elasticsearch like mongo 16:32:26 writed the journal log file in the same directory as where the data is stored 16:32:46 so if a snapshot with lvm is created (ro snap, immutable) 16:32:51 the data doesn't change 16:32:55 the backup is executed 16:33:05 and the data is crash consistent 16:33:17 which would like, the porwer suddenly goes way on that node 16:33:29 anyone see any issue here? 16:33:44 so we need to understand if elastic search store journal logs 16:33:52 I think so 16:33:58 but I might be wrong 16:34:10 and that might change 16:34:11 all good so far? 16:34:20 i think so 16:34:36 yes, time is a concern and we have 3 more topics, should we continue with this or move forward? 16:34:36 vannif, in mongodb the data is stored on the same directory 16:34:39 /var/lib/mongo 16:34:48 m3m0, one sec 16:34:50 this is critical 16:34:55 sorry 16:35:12 because the #1 solution would be easy to implement for your needs 16:35:17 as no code needs to be written 16:35:24 for the issue #2 16:35:29 we have a feature called job session ddieterly 16:35:36 yes, i like 1 then ;-) 16:35:46 I just explain, that you decide guys 16:35:48 :) 16:35:56 on #2 16:36:11 job session is used to execute backup at near the same time on multiple nodes 16:36:25 and that can be used to solve the shards inconsistencies 16:36:27 I think 16:36:30 before writing code 16:36:34 it's worth to test this 16:36:38 because it's fast 16:36:46 i don't think that that would give any guarantees 16:36:47 and will help us to improve job session 16:36:59 well. from what understand es has 2 ways of writing data: by default it writes data to all the shards before returning a positive ack to the user. that would result in all the shards having the data in their disks or journals 16:37:10 but I don't know why we want to backup all nodes, are not supposed to be replicas of the master node? 16:37:18 m3m0, it depends, 16:37:26 elastic search to scale and recude I/O 16:37:33 s/recude/reduce/ 16:37:40 we need to back up all shards 16:37:42 split the data on multiple nodes, called shards 16:37:43 another way is less secure: write data to the master and return a positive ack to the user. *then* replicate 16:37:51 ddieterly, ++ 16:38:00 I think with job session 16:38:12 the solution can be acceptable 16:38:18 because we have the same issue anyway 16:38:27 even if we use the snapshotting 16:38:30 built in feature in es 16:38:33 that needs to be execute 16:38:39 at near same time 16:38:42 across all the nodes 16:38:56 i'm not liking that; no guarantees 16:39:03 vannif, please off line if you can explain better the job session to ddieterly 16:39:04 ? 16:39:05 depends on timing 16:39:10 ddieterly, yes I agree 16:39:22 in helion all the nodes are synced with an ntp node 16:39:25 sure 16:39:31 but yes, you are right 16:39:39 no doubt about that, it is best effort 16:39:52 ddieterly, are you comfortable to test that? 16:39:58 or do you want to go with other solutions? 16:40:00 so, #1 seems reasonable if it guarantees consistency 16:40:24 I think if the writes of es are atomic 16:40:30 the consistency should be OK 16:40:33 but 16:40:41 100% consistency cannot be guaranteed 16:40:45 :( 16:41:05 it's a computer science challenge to execute two actions exactly on the same time on multipel nodes 16:41:12 not only our problem 16:41:21 the only way that 100% consistency can be guaranteed seems to be to use the snapshot feature of es 16:41:34 ok 16:41:40 then my advice yould be 16:41:43 would be 16:41:46 to write a script 16:41:51 that execute the snapshot with curl 16:42:09 and then execute the backup of data as fs backup with the agent 16:42:18 that wouldn't require writing code 16:42:20 I think #1 is reasonable, even though it relies on some assumptions. It does not require any new backup-mode anyway. we can leave an elasticsearch-mode for direct interaction with es (i.e. request a snapshot) 16:42:22 if we can snapshot to each node, then we can just back that up with freezer 16:42:41 ddieterly, yes 16:42:43 that was #2 16:42:57 now, we can dedice this even tomorrow 16:43:01 so, we need to investigate whether es can do that 16:43:04 yes 16:43:07 if so, that seems the best plan 16:43:10 ddieterly, ok 16:43:18 are you comfortable? can we move forward? 16:43:21 if not, then see if we can do #1 16:43:30 please vannif if you can explain job session also to ddieterly offline? 16:43:46 i'll setup a meeting 16:43:52 so we can move on the other topoic 16:43:56 we can do a hangout meeting 16:43:59 so I can participate 16:44:01 as you want 16:44:04 or an irc meeting 16:44:10 google hangout? 16:44:13 yes 16:44:19 sure, i'll set that up 16:44:20 hangout I thin kis better 16:44:21 ok 16:44:22 ty 16:44:26 np 16:44:31 m3m0, let's run fast :) 16:44:54 #topic cinder and nova backups 16:45:03 what's the status on this? 16:45:12 Mr reldan 16:45:13 :) 16:46:18 We have nova ephemeral disk backup (not incremental), cindernative backup (cannot be done on attached images (should be from liberty)), cinder backup (non-incremental) 16:46:46 is this working now? 16:46:47 Currently we cannot make a backup of whole vm with attached volumes 16:47:19 reldan, that what I think we need 16:47:31 because currently no one is providing a solution for taht 16:47:40 like nova + attached volumes 16:47:48 s/nova/nova vm/ 16:47:56 Yes for nova with epehemeral, No - for nova with bootable cinder volume (can be done throug cinder-backup) 16:48:02 can we inspect the vm and check if it has attached volumes and then execute nova and/or cinder backups? 16:48:20 m3m0, yes from the API 16:48:25 from the Nova API 16:48:37 frescof, please provide your inputs if any ^^ 16:48:47 And probably we have problem with auth_url v3 16:49:06 why? 16:50:01 I don’t know. But I saw that it cannot authorize (trying to use wrong http address or something like that) 16:50:16 mmhhh 16:50:32 m3m0: We can expect attached volumes - yes 16:50:32 we should be able to do that 16:50:54 But there are still a problem with consistency 16:51:01 reldan, at least the orchestration of backing up vms + attached volumes 16:51:10 Any backup/snapshot on attached volume can be corrupted 16:51:11 I think it should be provided 16:51:26 why? 16:51:33 is crash consistent anyway 16:51:58 because we use —force to do so 16:52:00 it's like backing up /var/lib/mysql with lvm without flushing the in memory data of mysql 16:53:11 there's no other way to do that from outside the vm 16:53:15 I think >( 16:53:41 I suppose the same. 16:53:54 unless we define a new mode? in freezer that inspect the architecture of the vm and execute internal and external backups 16:54:02 accordingly 16:54:06 I think 16:54:10 that make sense 16:54:13 but it's up to the user 16:54:22 if he want's to use 16:54:23 But if we want to have a backup that contains (let’s say) 3 cinder volumes, 1 nova instance with information about where we should mount each volume - we should define such format 16:54:56 but wait, each volume is a backup right? 16:55:29 m3m0, yes 16:55:54 If I understand it correct, the goal - is implementing full backup of instance with all attached volumes. In this case we should implement ephemeral disk backup, backup of each volume and metainformation - how to restore it 16:56:00 how to reassemble instance 16:56:15 It’s like metabackup of backups 16:56:29 the instance should be up and running again, is not freezer responsability to do that 16:56:40 the jobs for restore should only contain paths 16:57:01 So if you terminate your instance- you cannot restore it? 16:57:13 nop 16:57:31 mmhhh 16:57:33 you need somewhere to restore it 16:57:44 I think probably we need to keep it a bit simple, or we go through a dark sea 16:58:06 we can have this discussion offline 16:58:33 Let’s just say we have two openstack installation. If I understand the task correct - we should be able to create a backup in installation1 and restore the same configuration in instalation2 16:58:53 yes 16:59:01 so we can offer disaster recovery capabilities 16:59:27 let's do this 16:59:32 I dissagre 16:59:33 In this case it would be great to create and discuss blue print 16:59:33 I'll write a bp fo rthis stuff 16:59:38 the 16:59:43 and we can then discuss on that 16:59:46 change it and so on 16:59:49 m3m0, is that ok? 16:59:52 yes, of course 16:59:55 yes 16:59:55 ok 16:59:57 we are running late 16:59:59 let's move forward 17:00:00 yep 17:00:06 and we have 2 more topics 17:00:11 should we do it next week? 17:00:20 python freezer client and list of backups 17:00:35 let's to id 5 minutes now 17:00:38 s/id/it/ 17:00:45 python freezerclient 17:00:47 let's skipit 17:00:53 but list of backups 17:00:59 if fundamental that we have it in mitaka 17:01:03 vannif, ^^ 17:01:11 is essential... 17:01:23 we need to be able to list backups and restore using the scheduler 17:01:26 yes, and it's not complicated, the ui has that funcionality already 17:01:27 retrieving data at least form the api 17:01:32 m3m0, yep 17:01:35 its a matter of replicate that 17:01:37 vannif, can you do that please? 17:01:53 or m3m0 if your workload on the web ui 17:02:00 it's not huge 17:02:04 yes 17:02:09 vannif, ok thank you 17:02:14 then we'll move that stuff 17:02:18 in the python-freezerclient 17:02:20 ok 17:02:28 I 17:02:37 You 17:02:40 ol 17:02:42 lol 17:02:54 I've started to look at how to use cliff for the freezerclient 17:03:03 I'm very busy but I can do that if vannif is busy as well 17:03:05 vannif, yes but we cannot do that for now 17:03:11 vannif, can do that 17:03:23 sorry 17:03:27 we cannot do that 17:03:28 for now 17:03:38 you mean no cliff ? 17:03:41 we can do that after we split the code 17:03:42 yes 17:03:52 oh. ok. it's quicker then :) 17:03:52 wait wait 17:04:03 list from scheduler and the split? 17:04:19 list from scheduler can be doen now 17:04:28 python-freezerclient split code can be done now 17:04:40 python-freezerclient using cliff after the split 17:04:45 we can split vannif in 2 17:04:48 haha 17:04:51 even in 3 17:04:58 we can cut it in 3 17:05:10 the italian way of doing bussines :P 17:05:19 and doing sausages 17:05:19 ok guys what's the veredict? 17:05:22 ok 17:05:23 so 17:05:32 vannif, implement the job listing 17:05:37 I do the python-freezerclient split 17:05:46 after that 17:06:00 ok 17:06:03 #agree 17:06:05 we can use cliff on the freezerclient 17:06:10 ++ 17:06:11 ok 17:06:17 is that all? 17:06:41 yes 17:06:44 for now... 17:06:59 I'm going to write 17:07:03 ok guys thanks to all for your time 17:07:04 the bp for nova and cinder? 17:07:04 ok 17:07:08 perfect 17:07:16 do that daemontool 17:07:23 I'll do it m3m0 17:07:25 lol 17:07:27 :) 17:07:33 you pleae cut vannif in 3 17:07:34 #endmeeting