From ed63ab976a2a2ca690d0d9b5c1272168dc017500 Mon Sep 17 00:00:00 2001 From: Per Cederqvist <cederp@opera.com> Date: Sat, 7 Dec 2024 23:06:09 +0100 Subject: [PATCH] Improve README --- README | 195 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+) diff --git a/README b/README index 288647d..6916937 100644 --- a/README +++ b/README @@ -1,3 +1,198 @@ Testing the remote installation of rdiff-backup: ssh somehost -a -k -x -i /root/.ssh/backupkey /opt/LYSrdiff/bin/rdiff-backup --version + +Setting up a new system +======================= + +Where to back up data +--------------------- + +The lysrdiff system assumes that you back up data to a set of disks. +Each disk can be split in several partitions. Additionally, there can +be several copies of each disk. Disks are given a number starting +with 1. There is always a copy named "perm" which is permanently +mounted. Other copies, named A, B, C and so on, can be unmounted for +off-site backups. Partitions are given a number starting with 0. + +The backup jobs are distributed among the partitions. + +Putting this together, a system may have 2 disks, each with 2 +partitions, and 3 copies named "perm", "A" and "B". These disks would +then be named like this: + + 1/perm/0 + 1/perm/1 + 1/A/0 + 1/A/1 + 1/B/0 + 1/B/1 + 2/perm/0 + 2/perm/1 + 2/A/0 + 2/A/1 + 2/B/0 + 2/B/1 + +When a disk i present, it should be mounted on +e.g. /lysrdiff/1/perm/0. + +The lysrdiff-label-disk script labels a disk (or, rather, a filesystem +on a partition). Labeling a disk writes the partition name to +/lysrdiff/$disk/$copy/$part/lysrdiff.id; this ID is checked to ensure +the disks have not been mounted on the wrong mountpoint. + +A README file is also created, and a few spacefiller files that can be +removed in an emergency if the disk becomes full. (rdisk-backup can +get in a state where it can't remove data because the disk is full, so +it is good to have some spare data to remove.) + +The file /opt/LYSrdiff/var/newtasks should name the partition where +new backup jobs should be stored by default, on the "$disk/$part" +format. For a new system, you would likely start by doing: + + echo 1/0 > /opt/LYSrdiff/var/newtasks + +The /lysrdiff/$disk/perm/$part/lysrdiff/tasks file contains a list of +all backup jobs that are stored on that particular partition. It has +the same format as /opt/LYSrdiff/var/tasks, which is defined in the +next section. + +What to back up +--------------- + +The scripts fetch-backup-work and fetch-work-pcfritz are two +site-specific scripts that dynamically figures out what to backup. +For a new site, the first task is to write such a script. The end +result of running the script is the file /opt/LYSrdiff/var/tasks. +This file specifies all the backup tasks. + +Each task is defined by a single line with four white-space separated +fields: + +- category: often a host name, but it may also be "virtual" category + such as "homes" for all home directories or "mail" for mailspools. + +- subcategory: for the "homes" or "mail" category, this would + typically be a username. It can be any string, really: the + important thing is that the combination of category and subcategory + is unique. + +- host: the host name of the host that holds the data. It must be + possible to log in as root via ssh using the /root/.ssh/backupkey + identity. + +- directory: the directory on host that should be backed up. + +Distributing tasks to partitions +-------------------------------- + +The distribute-tasks script, when given the -i flag, reads the +/opt/LYSrdiff/var/tasks file and all the +/lysrdiff/$disk/perm/$part/lysrdiff/tasks files. It finds any new +backup job and assigns it to the partition specified by the "newtasks" +file (see above). It also detects if a backup job no longer exists, +and removes it from the partition task list. + +Whenever you use a *fetch* script to figure out what to back up, you +should run "distribute-tasks -i $disk/$part" to assign the jobs to a +partition. + +Without the "-i" flag, the job just reorders the task file on the +specified partitions, to ensure a fair distribution of backup times. + +Backups +======= + +Performing a backup +------------------- + +backup-all is the normal way to perform a backup. Name the disk and +partition on the command line: + + backup-all 1/0 + +You can also use the --failed, --retry, --new or --continue options +(see --help). + +The recommended setup is to run backup-all from cron, once per day. + +If you have situation where backups take more than 24 hours, you may +wish to instead use backup-repeatedly. It will start a new backup +cycle as soon as the previous on ended. But, really, you should find +a better solution. Perhaps you can set up multiple LYSrdiff systems +that back up part of your data, so that each system can finish in less +than 24 hours? + +Interrupting a backup +--------------------- + +To avoid data loss, you should avoid interrupting rdiff-backup, which +is what is used to backup each work. But LYSrdiff provides a few ways +to stop a backup in a controlled fashion. + +** Holding ** + +Create /opt/LYSrdiff/etc/hold to temporarily pause a backup. If that +file exists when backup-all is about to start a backup, it will print +a message and wait until the file disappears. This can be useful if +you want to unmount a copy of a disk for transport to offsite storage. + +** Stopping ** + +Create /opt/LYSrdiff/etc/stop to cause backup-all to exit cleanly once +the current backup job has finished. + +** Finishing ** + +Create /opt/LYSrdiff/etc/finish to cause backup-repeately to exit +cleanly once the current backup cycle has finished. + +Backup status +------------- + +The "lysrdiff-status" progam reports on the status of the backup +system. First, it runs "df -h" on all mounted backup partitions so +you get an overview of how full the partitions are. Then, it provides +a one-line summary of the backup jobs of each partition. This can +look like this: + +1/0: Tasks: 324 Fresh: 320 1day: 2 Stale: 1 Tot: 323 Warn: 1 Err: 1 + +This means that there are 324 tasks assigned to disk 1, partition 0. +320 backups completed in the last 24 hours, 2 are between 24 and 48 +hours old, and 1 backup is older than that. In total, there are 323 +backup tasks that has completed. (Since there are 324 tasks, that +means that 1 task never completed -- hopefully because it was just +assigned to the disk, but perhaps because there is a problem that +prevents it from being completed.) + +1 backup job produced a warning, which might mean that 1 or more of +the files were not properly backed up. 1 backup job failed +completely, meaning no files were backed up. + +The command then lists all backup jobs that resulted in a warning, +followed by all jobs that failed, followed by the 5 oldest backups, +the oldest backup per partition, and the newest backup per partition. +Finally, it lists all backups that are currently in progress. + +Progress console +---------------- + +lysrdiff-monitord.py is an optional monitoring daemon. If it is +running, the backup job will report progress information to it. This +can be viewed by running "telnet localhost 9934". As backups are +running, this will display some information. This can be a nice way +to see what is going on. + +Adding more sources +=================== + +To start backing up a new system, or a new directory on an old system, +you have run your *fetch* script again. Perhaps you also need to edit +it first. + +After running the *fetch* script, you need to assign the new task(s) +to a lysrdiff backup partition. First, check that +/opt/LYSrdiff/var/newtasks mentions the proper partition. Then, run +"distribute-tasks -i $disk/$part" to assign the task to that partition. -- GitLab