Answers:
I restore/save all files but dar reported some files have been ignored, what are those ignored files?When restoring/saving, all files are considered by default. But if you specify some files to restore or save, all other files are "ignored", this is the case when using -P -X -I, -g -[ or -] options
Dar hangs when using it with pipes, why?
Dar can produce backups on its
standard output, if you give '-' as
basename. But it cannot read a backup from its standard input in
direct access mode. To
feed a backup to dar through pipes, you either need dar_slave
and two pipes
or use the sequential mode (--sequential-mode
option, which gives
slow restoration of a few files compared to the (default) direct access mode).
To use dar with dar_slave over pipes in direct access mode (which is
the more efficient way to proceed), see the detailed notes or more
precisely dar and ssh note.
if you restore for example the file usr/bin/emacs dar will first restore usr (if the directory already exists, it will get its date and ownership restored, all existing files in that directory will however stay preserved), then /usr/bin will be restored, and last usr/bin/emacs will be restored. Thus 3 inodes have been restored or modified while only one file has been asked for restoration.
While compiling dar I get the following message:g++: /lib/libattr.a: No such file or directory
, what can I do?
The problem comes from an
incoherence in your distro (Redhat and Slackware seem(ed) concerned at
least): Dar (Libtool) finds
/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../libattr.la
file to link with. This file defines where is located libattr static and
dynamic libraries but in this file both static and dynamic libraries
are expected to be found under /lib. While the dynamic libattr is
there, the static version has been moved to /usr/lib. A
workaround is to make a symbolic link:
ln -s /usr/lib/libattr.a /lib/libattr.a
I cannot find the binary package for my distro, where to look for?
For any binary package, ask your distro maintainer to include dar (if not already done), and check on the web site of your preferred distro for a dar package
Can I use different filters between a full backup and a differential backup? Would not dar consider some file not included in the filter to be deleted?Yes, you can. No, there is no risk to have dar deleting the files that were not selected for the differential backup. Here is the way dar works:
During a backup process, when a file is ignored due to filter exclusion, an "ignored" entry is added to the catalogue. At the end of the backup, dar compares both catalogues, the one of reference and the new one built during the backup process, and adds a "detruit" entry (which means "destroyed" in French), when an entry of the reference is not present in the new catalogue. Thus, if an "ignored" is present no "detruit" will be added for that name. Then all "ignored" entries are removed and the catalogue is written at the end of the backup or archive.
Once in action, dar makes all the system slower and slower, then it stops with the message "killed"! How to overcome this problem?Dar needs virtual memory to work. Virtual memory is the RAM + SWAP space. Dar memory requirement grows with the amount of file saved, not with the amount of data saved. If you have a few huge files you will have little chance to see any memory limitation problem. At the opposite, saving a plethora of files (either big or small), will make dar request an increasing amount of virtual memory. Dar needs this memory to build the catalogue (the contents) of the backup it creates. Same thing, for differential backup, except it also needs to load in memory the catalogue of the backup of reference, which most of the time will make dar using twice more memory when doing a differential backup than a full backup.
Anyway, the solution is:
- Read the limitatons file to understand the problem and be aware of the limitations you will bring at step 3, bellow.
- If you can, add swap space to your system (under Linux, you can either add a swap partition or a swap file, which is less constraining but also a bit less efficient). Bob Barry provided a script that can give you a raw estimation of the required virtual memory (doc/samples/dar_rqck.bash), it was working well with dar 2.2.x but since then and the newly added features, the amount of metadata per file is variable: The memory requirement per file also depends on the presence and amount of Extended Attributes and Filesystem specific attributes, which changes from file to file.
- If this is not enough, or if you don't want/cannot add swap space, recompile dar giving --enable-mode=64 argument to the configure script. Note that since release 2.6.x this is the default compilation mode, thus you should be good now.
- If this is not enough, and you have some money, you can add some RAM on you system
- If all that fails, ask for support on the dar-support mailing-list.
Last, there is always the workaround to make several smaller backups of the files to save. For example, making a backup for all that is in /usr/local, another one for all that is in /var and so on. These backups can be full or differential. The drawback is not big as you can store these backups side by side and use them at will. Moreover, you can feed a unique dar_manager database with all these different backups which will hide you the fact that there are several full and several differential backups concerning different set of files.
I have a backup I want to change the size of slices?
dar_xform
is your friend!
dar_xform -s <size> original_backup new_backup
dar_xform
will create a new backup file with the slices of the requested
size, (you can also make use of -S option for the first slice). Note
that you don't need to decrypt the backup, not dar will uncompress it,
this is thus a very fast processing. See dar_xform
man page for more.
dar_xform
is your friend!
see just above.
I have a backup in several slice, how can I stick all them in a single file?
dar_xform
is your friend!
dar_xform original_backup new_backup
dar_xform without -s
option creates a single sliced backup.
See dar_xform man page for more.
The merging feature let you do that. The merging has two roles, putting in one backup the contents of two different backups, and at the same time filtering out some files you decided not to include into the resulting backup. The merging feature can take two but also only one backup as input. This is what we will use and without any filter to keep all saved files.
- a single input (our original backup)
- no file filtering (so we keep all the files)
-
keeping files compressed (no decompression/re compression) to
speed up the process (
-ak
option)
dar -+ new_backup -A original_backup -K "<new_algo>:new pass" -ak
If you don't want to have password in
clear on the command line (command that can be seen for example with top
or ps by other users), simply provide
"<algo>:"
then dar will ask you on the fly for the password. If using
blowfish you can then just provide ":"
for the keys. Note that before release 2.5.0, -J option was needed to provide
the password of the source backup. Since then, without -J option, dar will ask interactively
for the password of the backup to read. You can still use -J
option to provide the password from a DCF file and this way avoid dar interactively asking for it.
Note that you can also change slicing of the backup at the same time thanks to -s and -S options:
dar -+ new_backup -A original_backup -K ":" -ak -s 1G
I have a backup, how can I change its compression algorithm?
Same thing as above: we will use the merging feature:
to use bzip2 compression:
dar -+ new_backup -A original_backup -zbzip2
to use gzip compression
dar -+ new_backup -A original_backup -zgzip
to use lzo compression, use -zlzo
, for LZ4 use
-zlz4
, for zstd use -lzstd
and so on.
To use no compression at all, do no add any -z option or exclude all files from compression (-Z "*"
):
dar -+ new_backup -A original_backup
Note that you can also change encryption scheme and slicing at the same time you change compression:
dar -+ new_backup -A original_backup -zbzip2 -K ":" -J ":" -s 1G
Which options can I use with which options?
DAR provides seven commands:
-c
to create a new backup-x
to extract files from a given backup-l
to list the contents of a given backup-d
to compare the contents of a backup with filesystem-t
to test the internal coherence of a given backup-C
to isolate a backup (extract its contents to a usually small file) or make a snapshot of the current filesystem-+
to merge two backups in one or create a sub backup from one or two other ones-y
to repair a backup
For each command listed above, here follows the available options (those marked OK):
short option | long option | -c | -x | -l | -d | -t | -C | -+ | -y |
---|---|---|---|---|---|---|---|---|---|
-v | --verbose | OK | OK | OK | OK | OK | OK | OK | OK |
-vs | --verbose=s | OK | OK | -- | OK | OK | -- | OK | OK |
-b | --beep | OK | OK | OK | OK | OK | OK | OK | OK |
-n | --no-overwrite | OK | OK | -- | -- | -- | OK | OK | OK |
-w | --no-warn | OK | OK | -- | -- | -- | OK | OK | OK |
-wa | --no-warn=all | -- | OK | -- | -- | -- | -- | -- | -- |
-A | --ref | OK | OK | -- | OK | OK | OK | OK | OK |
-R | --fs-root | OK | OK | -- | OK | -- | -- | -- | -- |
-X | --exclude | OK | OK | OK | OK | OK | -- | OK | -- |
-I | --include | OK | OK | OK | OK | OK | -- | OK | -- |
-P | --prune | OK | OK | OK | OK | OK | -- | OK | -- |
-g | --go-into | OK | OK | OK | OK | OK | -- | OK | -- |
-] | --exclude-from-file | OK | OK | OK | OK | OK | -- | OK | -- |
-[ | --include-from-file | OK | OK | OK | OK | OK | -- | OK | -- |
-u | --exclude-ea | OK | OK | -- | -- | -- | -- | OK | -- |
-U | --include-ea | OK | OK | -- | -- | -- | -- | OK | -- |
-i | --input | OK | OK | OK | OK | OK | OK | OK | -- |
-o | --output | OK | OK | OK | OK | OK | OK | OK | -- |
-O | --comparison-field | OK | OK | -- | OK | -- | -- | -- | -- |
-H | --hour | OK | OK | -- | -- | -- | -- | -- | -- |
-E | --execute | OK | OK | OK | OK | OK | OK | OK | OK |
-F | --ref-execute | OK | -- | -- | -- | -- | OK | OK | OK |
-K | --key | OK | OK | OK | OK | OK | OK | OK | OK |
-J | --ref-key | OK | -- | -- | -- | -- | OK | OK | OK |
-# | --crypto-block | OK | OK | OK | OK | OK | OK | OK | OK |
-* | --ref-crypto-block | OK | -- | -- | -- | -- | OK | OK | OK |
-B | --batch | OK | OK | OK | OK | OK | OK | OK | OK |
-N | --noconf | OK | OK | OK | OK | OK | OK | OK | OK |
-e | --empty | OK | -- | -- | -- | -- | OK | OK | OK |
-aSI | --alter=SI | OK | OK | OK | OK | OK | OK | OK | OK |
-abinary | --alter=binary | OK | OK | OK | OK | OK | OK | OK | OK |
-Q | OK | OK | OK | OK | OK | OK | OK | OK | |
-aa | --alter=atime | OK | -- | -- | OK | -- | -- | -- | -- |
-ac | --alter=ctime | OK | -- | -- | OK | -- | -- | -- | -- |
-am | --alter=mask | OK | OK | OK | OK | OK | OK | OK | -- |
-an | --alter=no-case | OK | OK | OK | OK | OK | OK | OK | -- |
-acase | --alter=case | OK | OK | OK | OK | OK | OK | OK | -- |
-ar | --alter=regex | OK | OK | OK | OK | OK | OK | OK | -- |
-ag | --alter=glob | OK | OK | OK | OK | OK | OK | OK | -- |
-z | --compression | OK | -- | -- | -- | -- | OK | OK | -- |
-s | --slice | OK | -- | -- | -- | -- | OK | OK | OK |
-S | --first-slice | OK | -- | -- | -- | -- | OK | OK | OK |
-p | --pause | OK | -- | -- | -- | -- | OK | OK | OK |
-@ | --aux | OK | -- | -- | -- | -- | -- | OK | -- |
-$ | --aux-key | -- | -- | -- | -- | -- | -- | OK | -- |
-~ | --aux-execute | -- | -- | -- | -- | -- | -- | OK | -- |
-% | --aux-crypto-block | -- | -- | -- | -- | -- | -- | OK | -- |
-D | --empty-dir | OK | OK | -- | -- | -- | -- | OK | -- |
-Z | --exclude-compression | OK | -- | -- | -- | -- | -- | OK | -- |
-Y | --include-compression | OK | -- | -- | -- | -- | -- | OK | -- |
-m | --mincompr | OK | -- | -- | -- | -- | -- | OK | -- |
-ak | --alter=keep-compressed | -- | -- | -- | -- | -- | -- | OK | -- |
-af | --alter=fixed-date | OK | -- | -- | -- | -- | -- | -- | -- |
--nodump | OK | -- | -- | -- | -- | -- | -- | -- | |
-M | --no-mount-points | OK | -- | -- | -- | -- | -- | -- | -- |
-, | --cache-directory-tagging | OK | -- | -- | -- | -- | -- | -- | -- |
-k | --deleted | -- | OK | -- | -- | -- | -- | -- | -- |
-r | --recent | -- | OK | -- | -- | -- | -- | -- | -- |
-f | --flat | -- | OK | -- | -- | -- | -- | -- | -- |
-ae | --alter=erase_ea | -- | OK | -- | -- | -- | -- | -- | -- |
-T | --list-format | -- | -- | OK | -- | -- | -- | -- | -- |
-as | --alter=saved | -- | -- | OK | -- | -- | -- | -- | -- |
-ad | --alter=decremental | -- | -- | -- | -- | -- | -- | OK | -- |
-q | --quiet | OK | OK | OK | OK | OK | OK | OK | OK |
-/ | --overwriting-policy | -- | OK | -- | -- | -- | -- | OK | -- |
-< | --backup-hook-include | OK | -- | -- | -- | -- | -- | -- | -- |
-> | --backup-hook-exclude | OK | -- | -- | -- | -- | -- | -- | -- |
-= | --backup-hook-execute | OK | -- | -- | -- | -- | -- | -- | -- |
-ai | --alter=ignore-unknown-inode-type | OK | -- | -- | -- | -- | -- | -- | -- |
-at | --alter=tape-marks | OK | -- | -- | -- | -- | -- | OK | -- |
-0 | --sequential-read | OK | OK | OK | OK | OK | OK | -- | -- |
-; | --min-digits | OK | OK | OK | OK | OK | OK | OK | OK |
-1 | --sparse-file-min-size | OK | -- | -- | -- | -- | -- | OK | -- |
-ah | --alter=hole-recheck | -- | -- | -- | -- | -- | -- | OK | -- |
-^ | --slice-mode | OK | -- | -- | -- | -- | OK | OK | OK |
-_ | --retry-on-change | OK | -- | -- | -- | -- | -- | -- | -- |
-asecu | --alter=secu | OK | -- | -- | -- | -- | -- | -- | -- |
-. | --user-comment | OK | -- | -- | -- | -- | OK | OK | -- |
-3 | --hash | OK | -- | -- | -- | -- | OK | OK | OK |
-2 | --dirty-behavior | -- | OK | -- | -- | -- | -- | -- | -- |
-al | --alter=lax | -- | OK | -- | -- | -- | -- | -- | OK |
-alist-ea | --alter=list-ea | -- | -- | OK | -- | -- | -- | -- | -- |
-4 | --fsa-scope | OK | OK | -- | OK | -- | -- | OK | -- |
-5 | --exclude-by-ra | OK | -- | -- | -- | -- | -- | -- | -- |
-7 | --sign | OK | -- | -- | -- | -- | OK | OK | OK |
-' | --modified-data-detection | OK | -- | -- | -- | -- | -- | -- | -- |
-{ | --include-delta-sig | OK | -- | OK | -- | -- | OK | -- | -- |
-} | --exclude-delta-sig | OK | -- | OK | -- | -- | OK | -- | -- |
-8 | --delta | OK | -- | OK | -- | -- | OK | -- | -- |
-6 | --delta-sig-min-size | OK | -- | OK | -- | -- | OK | -- | -- |
-az | --alter=zeroing-negative-dates | OK | -- | -- | -- | -- | -- | -- | -- |
-\ | --ignored-as-symlink | OK | -- | -- | -- | -- | -- | -- | -- |
-T | --kdf-param | OK | -- | OK | -- | -- | OK | -- | -- |
--aduc | --alter=duc | OK | OK | OK | OK | OK | OK | OK | OK |
-G | --multi-thread | OK | OK | OK | OK | OK | OK | OK | OK |
-j | --network-retry-delay | OK | OK | OK | OK | OK | OK | OK | -- |
-afile-auth | --alter=file-authentication | OK | OK | OK | OK | OK | OK | OK | -- |
-ab | --alter=blind-to-signatures | OK | OK | OK | OK | OK | OK | OK | -- |
-aheader | --alter=header | -- | -- | OK | -- | -- | -- | -- | -- |
Why dar reports corruption of the backup I have transfered with FTP?
Dar backups are binary files, they must be transfered in binary mode when using FTP. This is done in the following way for the ftp command-line client :
ftp <somewhere>
<login>
<password>
bin
put <file>
get <file>
bye
If you transfer a backup (or any other binary file) in ascii mode (the opposite of binary mode), the 8th bit of each byte will be lost and the backup will become impossible to recover (due to the destruction of this information). Be very careful to test your backup after transferring back to you host to be sure you can delete the original file.
Why DAR does save UID/GID instead of plain usernames and usergroups?In each file property is not present the name of the owner nor the name of the group owner, but instead are present two numbers, the user ID and the group ID (UID & GID in short). The /etc/password file associates to these numbers a names and some other properties (like the login shell, the home directory, the password, see also /etc/shadow). Thus, when you do a directory list (with the 'ls' command for example or with any GUI program for another example), the listing application used does open each directory, there it finds a list of name and a inode number associated, then the listing program fetchs the inode attributes for each file and looks among other information for the UID and the GID. To be able to display the real user name and group name, the listing application use a well-defined standard C library call that will do the lookup in /etc/password, eventually NIS system if configured and any other additional system, [this way applications have not to bother with the many system configuration possible, the same API interface is used whatever is the system], then lookup returns the name if it exist and the listing application displays for each file found in a directory the attributes and the user name and group name as returned by the system instead of the UID and GID.
As you can see, the user name and group name are not part of any file attribute, but UID and GID *are* instead. Dar is a backup tool mainly, it does preserve as much as possible the file properties to be able to restore them as close as possible to their original state. Thus a file saved with UID=3 will be restored with UID=3. The name corresponding the UID 3 may exist or not, may exist and be the same or may exist and be different, the file will be anyway restored in UID 3.
Scenario with dar's way of restoring
Thus, when doing backup and restoration of a crashed system you can be confident, the restoration will not interfere with the bootable system you have used to launch dar to restore your disk. Assuming you have UID 1 labeled 'bin' in your real crashed system, but this UID 1 is labeled 'admin' in the boot system, while UID 2 is labeled 'bin' in this boot system, files owned by bin in the system to restore will be restored under UID 1, not UID 2 which is used by the temporary boot system. At that time after restoration still running the from the boot system, if you do a 'ls' you will see that the original files owned by 'bin' are now owned by user 'admin'.
This is really a mirage: in your restoration you will also restore the /etc/password file and other system configuration files (like NIS configuration files if they have been used), then at reboot time on the newly restored real system, the UID 1 will be backed associated to user 'bin' as expected and files originally owned by user bin will now been listed as owned by bin as expected.
Scenario with plain name way of restoring
If dar had done else, restoring the files owned by 'bin' to the UID corresponding to 'bin', these files would have been given UID 2 (the one used by the temporary bootable system used to launch dar). But once the real restored system would have been launched, this UID 2 would have become some other user and not 'bin' which is mapped to UID 1 in the restored /etc/password.
Now, if you want to change some UID/GID when moving a set of files from one live system to another system, there is no problem if you are not restoring dar under the 'root' account. Other account than 'root' are usually not allowed to modify UID/GID, thus restored files by dar will have group and user ownership of the dar process, which is the one that has launched dar.
But if you really need to move a directory tree containing a set of files with different ownership and you want to preserve these different ownership from one live system to another, while the corresponding UID/GID do not match between the two system, dar can still help you:
- Save your directory tree on the source live system
- From the root account in the destination live system do the following:
- restore the backup content in a empty directory
-
change the UID of files according to the one used by the
destination filesystem with the command:
find /path/to/restored/backup -uid <old UID> -print -exec chown <new name> {} \; find /path/to/restored/backup -gid <old GID> -print -exec chgrp <new name> {} \;
The first command will let you remap an UID to another for all files under the /path/to/restored/backup directory
The second command will let you remap a GID to another for all files under the /path/to/restored/backup directory
For example, you have on the source system three users: Pierre (UID 100), Paul (UID 101), Jacques (UID 102) but on the destination system, these same users are mapped to different UID: Pierre has UID 101, Paul has UID 102 and Jacques has UID 100.
We temporary need an unused UID on the destination system, we will assume UID 680 is not used. Then after the backup restoration in the directory /tmp/A we will do the following:
find /tmp/A -uid 100 -print -exec chown 680 {} \;
find /tmp/A -uid 101 -print -exec chown pierre {} \;
find /tmp/A -uid 102 -print -exec chown paul {} \;
find /tmp/A -uid 680 -print -exec chown jacques {} \;
which is:
- change files of UID 100 to UID 680 (the files of Jacques are now under the temporary UID 680 and UID 100 is now freed)
- change files of UID 101 to UID 100 (the files of Pierre get their UID of the destination live system, UID 101 is now freed)
- change files of UID 102 to UID 101 (the files of Paul get their UID of the destination live system, UID 102 is now freed)
- change files of UID 680 to UID 102 (the files of Jacques which had been temporarily moved to UID 680 are now set to their UID on the destination live system, UID 680 is no more used).
You can then move the modified files to appropriated destination or make a new dar backup to be restored in appropriated place if you want to use some of dar's feature like for example only restore files that are more recent than those present on filesystem.
Dar_Manager does not accept encrypted backups, how to workaround this?Yes, that's true, dar_manager does not accept encrypted backups. The first reason is that while dar_manager database cannot be encrypted this is not very fair to add to them encrypted backups. The second reason is because the dar_manager database should hold the key for each encrypted backup making this backup the weakest point in your data security: Breaking the database encryption would then provide access to any encryption key, and with original backup access it would bring access to data of any of the backup added to the database.
To workaround this, you can proceed as follows:
- isolate your encrypted backup into an unencrypted 'isolated catalogue': Do not use the -K option while isolating. Without -J option dar will prompt for the password of the encrypted archive. For automated process, you are encouraged to use a DCF file with restricted permissions and containing the '-J <key>' option to be passed for dar. The instruct dar to read that file thanks to -B option.
- add these extracted catalogue to the dar_manager database of your choice,
- change the name and path of the added catalogue to point to your real encrypted backups (-b and -p options of dar_manager).
Note that as the database is not encrypted this will expose the backup file listing (not the file's contents) of your encrypted backups to anyone able to read the database, thus it is recommended to set restrictive permission to this database file.
When will come the time to use dar_manager to restore some file, you will have to make dar_manager pass the key to dar for it be able to restore the needed files from the backup. This can be done in several ways: dar_manager's command-line, dar_manager database or dar.dcf file.
- dar_manager's command-line: simply pass the -e "-K <key>" to dar_manager . Note that this will expose the key twice: on dar_manager's command-line and on dar's command-line.
- dar_manager database: the database can store some constant command to be passed to dar. This is done using the -o option, or the -i option. The -o option exposes the arguments you want to be passed to dar because they are on dar_manager command-line. While the -i option, let you do the same thing but in an interactive manner, this is a better choice.
- A better way is to use a DCF file with restrictive permission. This one will receive the '-K <key>' option for dar to be able to read the encrypted backups. And dar_manager will ask dar to read this file thanks to the '-B <filename>' option you will have given either on dar_manager's command-line (-e -B <filename> ...) or from the stored option in the database (-o -B <filename>).
- The best way is let dar_manager pass the -K option to dar, but without password : simply passing the -e "-K :" option to dar_manager. When dar will get the -K option with the ":" argument, it will dynamically ask for the password and store it in a secured memory.
The answer comes from Dave Vasilevsky in an email to the dar-support mailing-list. I let him explain how to do:
Pure-static executables aren't used on OS X. However, Mac OS X does have other ways to build portable binaries. HOWTO build portable binaries on OS X?
First, you have to make sure that dar only uses operating-system libraries that exist on the oldest version of OS X that you care about. You do this by specifying one of Apple's SDKs, for example:
export CPPFLAGS="-isysroot /Developer/SDKs/MacOSX10.2.8.sdk"
export LDFLAGS="-Wl,-syslibroot,/Developer/SDKs/MacOSX10.2.8.sdk"
Second, you have to make sure that any non-system libraries that dar links to are linked in statically. To do this edit dar/src/dar_suite/Makefile, changing LDADD to '../libdar/.libs/libdar.a'. If any other non-system libs are used (such as gettext), change the makefiles so they are also linked in statically. Apple should really give us a way to force the linker to do this automatically!
Some caveats:
- If you build for 10.3 or lower, you will not get EA support, and therefore you will not be able to save special Mac information like resource forks.
- To work on both ppc and x86 Macs, you need to build a universal binary. For instructions, use Google -)
- To make a 10.2-compatible binary, you must build with GCC 3.3.
- These instructions won't work for the 10.1 SDK, that one is harder to use.
Well this is due to dar's design. However you can list a whole backup and see in which slice(s) a file is located:
# dar -l test -Tslice -g etc/passwd
Slice(s) |[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane
--------+--------------------------------+----------+-----------------------------
1 [Saved][-] [-L-][ 69%][ ] drwxr-xr-x etc
2 [Saved][ ] [-L-][ 63%][ ] -rw-r--r-- etc/passwd
-----
All displayed files have their data in slice range [1-2]
-----
#
Why cannot I merge two isolated catalogues?
Since version 2.4.0, isolated catalogues can also be used to rescue an corrupted internal catalogue of the backup it has been isolated from. For that feature be possible, a mecanism let dar know if an given isolated catalogue and a given backup correspond to the same contents. Merging two isolated catalogues would break this feature as the resulting backup would not match any real backup an could only be used as reference for a differential backup.
How to use the full power of my multi-processor computer?Since release 2.7.0 it is possible to have dar efficiently using many threads at two independent levels:
- encryption
- You can specify the number of thread to use to cipher/decipher a backup. Note however that during tests done for 2.7.0 validation, it was observed that having more than two threads for encryption does not gives better results than using only two threads when compression is used, because most of the time compression is more CPU intensive than encryption (well all depends on the chosen algorithms, that's right).
- compression
- Before release 2.7.0 compression was done per file in streaming mode. In this mode to compress data you need to know the result of the compression of the data that is located before it, this brings good compression ratio but is impossible to parallelize. To be able to compress in parallel one need to split data in block, and compress blocks independently. There you can use a lot of threads up to the time when this is the disk I/O that is the slowest process. Adding more compression thread will not change the result. The drawback of compressing per thread is less the compression ratio that is slightly less good than in stream compression mode, than the memory requirement to hold a data block of clear data per thread and the compressed resulting data, times the number of threads. To avoid having any thread waiting for disk I/O, you even have to store a bit more memory block than the number of threads, this is managed by libdar.
To activate multi-threading with dar, use the -G
option, read the dar man page
for all details about the way to define the number of encryption thread and the number of
compression thread, as well as the compression block size to use.
libdar is the part of dar's source code that has been rewritten to be used by external programs (like kdar). It has been modified to be used in a multi-threaded environment, thus, *yes*, libdar is thread-safe. However, thread-safe does not mean that you do not have to take some precautions in your programs while using libdar (or any other library).
Care must thus be taken for two different threads not acting on the same variables/objects at the same time. This is however possible with the use of posix mutex, which would define a portion of code (known as a critical section) that cannot be entered by more than one thread at a time.
A few objects provided by libdar API supports the concurrent access from several threads, read the API documentation for more.
How to solveconfigure: error: Cannot find size_t type
?This error shows when you lack support for C++ compilation. Check the gcc compiler has been compiled with C++ support activated, or if you are using gcc binary from a distro, double check you have installed the C++ support for gcc.
Why dar became much slower since release 2.4.0?This is the drawback of new features!
- Especially to be able to read dar backup through pipes in sequential mode, dar inserts so-called "escape sequence" (also referred as tape mark) to know for example when a new file starts. This way dar can skip to the next mark upon backup corruption or if the given file has not to be restored. However, if such a sequence of byte is found into a file's data, it must be modified not to collide with real escape sequences. This leads dar to inspect all data added to a backup for such sequence of byte, instead of just copying the data to the backup (eventually compressing and cyphering it).
- The other feature that brings an important overhead is the sparse file detection mechanism. To be able to detect a hole in a file and store it into the backup, dar needs here too, to inspect each file's data.
You can disable both of these features, using respectively the options -at
option, which suppress "tape marks" (just another name for escape
sequences), but does not allow the generated backup to be used in
sequential read mode, and -1 0
option, which completely disables the sparse file detection. The
execution time becomes back the same as the one of dar 2.3.x releases.
This is again the drawback of new features!
- The first feature that drain time is Filesystem Specific Attributes (FSA), as it requires new system calls for each new files to save. This has little impact when saving a lot of big files but become visible when saving a lot a tiny files or directories.
- The second feature is the use of fadvise() system call that preserves cache usage. In other words dar tells the system it does not need anymore a file when it has been read (backup) or written (restoration) this has the advantage to reduce cache presure from dar to the benefit of other running process needs. The idea here is to preserves as much as possible a live operating system from being affected by a running backup relying on dar. The consequence is that if running dar a second time on the same set of file, with dar 2.4.x and below the data to save was most of the time in the cache which could lead to very fast execution, while with dar 2.5.x the data to save may have been flushed out of the cache by more important data for another application. This second time dar is run, the data has to be read again from disk which does not bring the same very fast execution as reading from cache.
You can disable both of these features. The first can be disabled at compilation time
giving --disable-fadvise
to the ./configure
script.
The second option can be disabled at any time by adding the
--fsa-scope=none
option to dar
.
The execution time becomes back then the same as the one of dar 2.4.x releases.
Have a look a the dar-support mailing-list archive and if you cannot find any answer to your problem feel free to send an email to this mailing-list describing your problem/need.
Why dar tells me that he failed to open a directory, while I have excluded this directory?Reading the contents of a directory is done using the usual system call (opendir/readdir/closedir). The first call (opendir) let dar design which directory to inspect, the dar call readdir to get the next entry in the opened directory. Once nothing has to be read, closedir is called. The problem here is that dar cannot start reading a directory do some treatment and start reading another directory. In brief, the opendir/readdir/closedir system call are not re-entrant.
This is in particular critical for dar as it does a depth lookup in the directory tree. In other words, from the root if we have two directories A and B, dar reads A's contents, the contents of its subdirectories, then once finished, it read the next entry of the root directory (which is B), then read the contents of B and then of each of its subdirectories, then once finished for B, it must go back to the root again, and read the next entry. In the meanwhile dar had to open many directories to get their contents.
For this reason dar caches the directory contents (when it first meet a directory, it read its whole content and stores it in the RAM). This is only after, that dar decide whether to include or not a given directory. But at this point then, its contents has already been read thus you may get the message that dar failed to read a given directory contents, while you explicitly specify not to include that particular directory in the backup.
Dar reports aSECURITY WARNING! SUSPICIOUS FILE
what does that mean!?
When dar reports the following message:
SECURITY WARNING! SUSPICIOUS FILE <filepath>: ctime changed since backup of reference was done, while no inode or data changed
You should be concerned by finding an explanation to the root cause that triggered dar to ring this alarm. As you probably know, a unix file has three (sometimes four) dates:
- atime is changed anytime you read the file's contents or write to it (this is the last access time)
- mtime is changed anytime you write to the file's data (this is the last modification time)
- ctime is changed anythime ou modify the file's attributes (the is the last change time)
- btime is never changed once a file has been created (this is the birth time or creation time), not all filesystem do provide it.
In other words:
- if you only read the data of file, only its atime will be updated1
- if you write some data to a file, its ctime and mtime will change, atime will stay unchanged
- if you change ownership, permission, extended attributes, etc. only ctime will change
- if you write to a file and modify its atime or mtime to let think the file has not been read or modified, ctime will change in any case.
Yes, the point is that in most (if not all) unix systems, over the kernel itself, user program can also manually set the atime and mtime manually to any arbitrary value (see the "touch" command for example), but to my knowledge, no system provides a mean to manually set the ctime of a file. This value cannot thus be faked.
However, some rootkits and other nasty programs that tend to hide themselves from the system administrator use this trick and modify the mtime to become more difficult to detect. Thus, the ctime keeps track of the date and time of their infamy. However, ctime may also change while neither mtime nor atime do, in several almost rare but normal situations. Thus, if you are faced to this message, you should first verify the following points before thinking your system has been infected by a rootkit:
- have you added or removed a hardlink pointing to that file and this file's data has not been modified since last backup?
- have you changed this file's extended attributs (including Linux ACL and MacOS file forks) while file's data has not been modified since last backup?
- have you recently restored your data and are now performing a differential backup taking as reference the backup used to restore that same data? Or in other words, does that particular file has just been restored from a backup (was removed by accident for example)?
- have you just moved from a dar version older than release 2.4.0 to dar version 2.4.0 or more recent?
- have you upgraded the package this file is part of since last backup?
How to know atime/mtime/ctime of a file?
- mtime is provided by the command:
ls -l
- atime is provided by the command :
ls -l --time=atime
- ctime is provided by the command :
ls -l --time=ctime
- the
stat
command provides all dates of a given file:stat <filename>
- Note:
- With dar version older than 2.4.0 (by default, unless -aa option is use) once a file has been read for backup, dar set back the atime to the value it had before dar read it. This trick was used to accomodate some programs like leafnode (NNTP caching program) that base their cache purging scheme on the atime of files. When you do a backup using dar 2.3.11 for example, file that had their mtime modified are saved as expected and their atime is set back to their original values (value they had just before dar read them), which has the slide effect to modify the ctime. If then you upgrade to dar 2.4.0 or more recent and do a differential backup, if that same file has not been modified since, dar will see that the ctime has changed while no other metadata did (user, ownership, group, mtime), thus this alarm message will show for all saved files in the last 2.3.11 backup made. The next differential backup made using dar 2.4.0 (or more recent), the problem will not show anymore.
Well, if you cannot find an valid explanation from the one presented above, you'd better consider that your system has been infected by a rootkit or a virus and use all the necessary tools (see below for examples) to find some evidence of it.
- Rootkit Hunter
- Unhide
- clam anti-virus
- and others...
Last point, if you can explain the cause of the alarm and are annoyed
by it (you have hundred of files concerned for example) you can
disable this feature adding the -asecu
switch to the command-line.
1 atime may also not be updated at all if filesystem is mounted with relatime or noatime option.
Can dar help copy a large directory tree?The answer is "yes" and even for more than one reason:
- Many backup/copy tools do not take care of hard linked inode (hard linked plain files, named pipes, char devices, block devices, symlinks)... dar does,
- Many backup/copy tools do not take care of sparse files... dar does,
- Many backup/copy tools do not take care of Extended Attributes... dar does,
- Many backup/copy tools do not take care of Posix ACL (Linux)... dar does,
- Many backup/copy tools do not take care of file forks (MacOS X)... dar does,
- Many backup/copy tools do not take any precautions while working on a live system... dar does.
Using the following command will do the trick without relying on temporary file or backup:
dar -c - -R <srcdir> --retry-on-change 3 -N | dar -x - --sequential-read -N -R <dstdir>
<srcdir>
contents will be copied to <dstdir>
both must exist before running this command, and <dstdir>
should be an empty dir.
Here is an example: we will copy the content of /home/my to /home2/my.
first we create the destination directory, then we run dar
mkdir /home2/my
dar -c - -R /home/my --retry-on-change 3 | dar -x - --sequential-read -R /home2/my
The --retry-on-change
let dar
retry the copy of a file up to three times if that file has changed at
the time dar was reading it. You can increase this number at will. If a
file fails to be copied correctly after more than the allowed retry, a
warning is issued about that file and it is flagged as dirty in the
data flow, the second dar command will then ask you whether you want it
to be restored (here copied) on not.
"piping" ('|' shell syntax) the first dar's output to the second dar's input makes the operation not requiering any temporary storage, only virtual memory is used to perform this copy. Compression is thus not requested as it would only slow down the whole process.
last point, you
should compare the copied data to the original one, before removing it,
as no backup file has been dropped down to filesystem. This can simply
be done using:
diff -r <srcdir> <dstdir>
But, no, diff will not check extended Attributes, File Forks or Posix ACL, hard linked inodes, etc. If you want a more controlable way of copying a large directory, simply use dar with a real backup file, compare the backup toward the original filesystem, restore the backup contents to its new place, and compare the restored filesystem toward the original backup.
Any better idea? Feel free to contact dar's author for an update of this documentation!
Does dar compress per file or the whole backup?Dar uses compression (gzip, lzo, bzip2, xz/lzma, zstd, lz4, ...) with different level of compression (1 for quick but low compression up to 9 for best compression but slower execution) on a file by file basis. I other words, the compression engine is reset for each new file added into the backup. When a corruption occurs in a file like a compressed tar backup, it is not possible to decompress the data passed that corruption, with tar you loose all files stored after such data corruption.
Having compression per file has instead the advantage to only impact one file inside the backup and all files that are stored before or after such data corruption can still be restored from that corrupted backup. Compressing per file opens the possibility to not compress all files in the backup, in particular already compressed files (like *.jpeg, *.mpeg, some *.avi files and of course the *.gz, *.bz2 or *.lzo files). Avoiding compressing already compressed files save CPU cycles (in other words it speeds up backup process time). And while compressing an already compressed file takes time for nothing, it also leads to require more storage space than if that same file was not compressed a second tim
The drawback is that the overall compression ratio is slightly less good.
How to activate compression with dar? Use the
--compression
option (or -z in short), telling the algorithm to use and the
compression level (--compression=bzip2:9 or -zgip:7 for example), you
may not mention the compression ratio (which default to 9) and even not
mention the compression algorithm which default to gzip. Thus -z or
-zlzo are correct.
To select file to compress or not compress, several options are
available: --exclude-compression
(or -Z in short --- the uppercase Z here) --include-compression
(or -Y in short). Both take as argument a mask that based on their
names define files that have to be compressed or not to be compressed. For
example -Z "*.avi" -Z "*.mp?" -Z "*.mpeg"
will avoid compressing MPEG,
MP3, MP2 and AVI files. Note that dar provides in its /etc/darrc
default configuration file, a long list of -Z options to avoid compressing
most common compressed files, that you can activate by simply adding
compress-exclusion
on dar command-line.
In addition to excluding/including files from compression based on
their name, you can also exclude small files (for which compression
ratio is usually poor) using the --mincompr
option which takes a size as argument: --mincompr 1k
will avoid compressing
files which size is less than or equal to 1024 bytes. You should find
all details about these options in dar man page.
Check also the -am
and -ar
options to
understand how --exclude-compression
and
--include-compression
interact with
each other, or how to use regular expressions in place of
glob expressions in masks.
The minimum slice size is around 20 bytes,
but you will only be able to store 3 to 4 bytes of information per
slice, due to the slice header that need around 15 bytes in each slice (this vary
depending on options used and may increase in future backup version
format). But there is no maximum slice size!
In other words you can give to -s
and -S
options
an as long as required positive integer, thanks to its internal own
integer type named "infinint"
dar is able to handle arbitrarily large
integers (file offset, file size, etc.).
You can make use of suffixes like 'k' for kilo, M for mega, G for giga
etc... (all suffixes are
listed here) to simplify your work. See also
the -aSI
and -abinary
options to swap meaning
between ko (= 1000 octets) kio (= 1024 octets).
Last point dar/libdar can be compiled using the --enable-mode=64 option
given to ./configure while building dar (this is the default since release 2.6.0).
This replaces the "infinint" type by 64 bits integers, for better performances and reduced memory usage.
However this has some drawback on backup size and dates. See the limitations for more details.
Since release 2.6.0 the default being the 64 bits mode, to have dar/libdar using infinint
one need to use the following option ./configure --enable-mode=infinint
.
You can find several applications relying on dar or directly on libdar to manage dar backup, these are referred here as external software because they are not maintained nor have been created by the author of dar and libdar. AVFS is such external software that provides a virtual file system layer for transparently accessing the content of backups and remote directories just like local files.
how dar compares to tar or rsyncAll depends on the use case you want to address. A benchmark has been setup to match the performances, features and behaviors or dar, rsync and tar in regard to a set of common use cases. Hopefully this will help you answer this question.
Why when comparing a backup with filesystem, dar does not report new files found on filesystem?
Backup comparison (-d
option) is
to be seen as a step further than
backup testing (-t
option) where dar checks the backup internal
structure and usability. The step further here is not only to check
that each part of the backup is readable and has a correct associated
CRC but also that it matches what is present on filesystem. So yes, if
new files are present on filesystem, nothing has to be reported. If a
file changed, dar reports that the file does not match what's in the
backup, if a file is missing dar cannot compare it with filesystem
and reports an error too.
So you want to know what has changed on your filesystem? No problem, do a differential backup! OK, you don't want to have a new backup or do not have the space for that, just output the backup to /dev/null and request on-fly isolation as follows:
dar -c - -A <ref backup> -@ <isolated> ... other options ... > /dev/null
<ref backup>
- is the backup of reference or an isolated catalogue
<isolated>
- is the name of the isolated catalogue to produce.
Once the operation has completed, you can list the isolated catalogue using the following command:
dar -l <isolated> -as
It will give you the exact difference between your current filesystem
and the filesystem at the time the <ref backup>
was done:
modified files and new files are reported with [inref]
for either data EA or both,
while deleted files are reported by [--- REMOVED ENTRY ----]
information, followed by the estimated removal date and the type of the
removed file ([-]
for plain file, [d]
for directory, and so on. More
details in dar man page for listing command).
Because delta different is subject in theory to checksum collision (but it is very
unprobable though), which could lead a new version of a file being seen
the same as an older one while some changes took place in it. A second
reason is to take care of users preference, that do not want
having this feature activated by default. Well, now, activating delta
difference with dar is quite simple and flexible, see note.
Dar/libdar has been first developer for Linux. It has been later ported to many other operating systems. For Unix-like system (FreeBSD, Solaris, ...), it can run as a native program by just recompiled it for the target OS and processor. For Windows system, it cannot because Unix and Windows systems do not provide the same system calls at all. The easiest way to have dar running under Windows was to rely on Cygwin, which translates the Unix system calls to Windows system calls. However Cygwin brings some limitations. One of them is that it cannot provide filenames longer than 256 bytes, while today's Windows can have much longer filenames.
What the point with cyrillic filenames? Cyrillic characters unlike most latin ones are not stored as a single byte, they usually use several bytes per character, thus this maximum file size is reached much quicker than with latin filenames, but the problem also exists with them.
The consequence is that when dar reads a directory that contains a large filename, the Cygwin layer is not able to provide it entierly: the filename is truncated. When dar wants to read information about that filename most of the time such truncated filename does not exists and dar reports the message from the system that this file does not exists (which might sound strange from user point of view). Since release 2.5.4 dar reports instead that filename has been truncated and that it will be ignored.
I have a 32 bits windows system, which binary package can I to use?Up to release 2.4.15 (including) the dar/libdar binaries for windows were built on a 32 bits windows (XP) system. After that release, binaries for windows have been built using a 64 bits windows system (7, now 8 and probably 10 soon). Unfortunately, the filename of the binary packages for windows do not reflect that change and have still been labeled "i386" while included binaries do no more supporting i386 CPU family (which are 32 bits CPU). This is an oversight that has been unseen until Adrian Buciuman's remark in dar-support mailing-list September 23d, 2016. In consequence after that date binary packages for windows will receive an additional field corresponding to the windows flavor they have been built against.
Some may still need 32 bits windows binaries of dar, unfortunately I have no more access to such system, but if you have such windows ISO image and valid license to give me, I could install it into a virtual machine and provide binary packages for 32 bits too.
Until then, you can build yourself the binary for windows. Here follows the recipe:
install Cygwin on windows including at least the following packages:
- clang C/C++ compiler
- cygwin devel
- doxygen
- gettext-devel
- liblzo2-devel
- libzzip-devel
- libgpgme-devel
- librsync-devel (starting future release 2.6.0)
- make
- tcsh
- zip
- upx
Then get the dar source code and extract its content (either using windows native tools or using tar under cygwin)
For clarity let's assuming you have extracted dar source package for version x.y.z
into C:\Temp
directory, thus you now have the directory C:\Temp\dar-x.y.z
Run a cygwin terminal and "cd" into that directory:
cd /cygdrive/c/Temp/dar-x.y.z
In the previous command, note that from within a cygwin shell, the path use slashes not windows backslashes ; note also the 'c' is lowercase while windows shows upper case letter for drives...
But don't worry, we are almost finished, run the following script:
misc/batch_cygwin x.y.z
starting release 2.5.7 the syntax will change / has changed
misc/batch_cygwin x.y.z win32
the new "win32" or "win64" field will be used to label the zip package containing the dar/libdar binary for windows, that's up to you to choose the value corresponding to your OS 32/64 bits flavor.
At the end of the process you will get a dar zip file for windows in
C:\Temp\dar-x.y.z
directory.
Feel free to ask for support on dar-support mailing-list if you enconter any problem building dar binary for windows, this FAQ will be updated accordingly.
Path slash and back-slash consideration under Windows
The paths given to dar's arguments and options
must respect the UNIX way (use slashes "/" not back
slashes "\" as it ought to be under Windows)
thus for example you have to have to use /temp
in place of \temp
Moreover, drive letters cannot be used the usual way, like
. Instead you will have to
give the following path
.
As you see the /cygdrive
directory is a
virtual directory that has all the drives as children directories:
Here is a more global example:
c:\dar_win-1.2.1\dar -c /cygdrive/f/tmp/toto -s 2G -z1 -R "/cygdrive/c/My Documents"
^ ^ ^ ^ ^
| | | | |
--------------- ---------------------------
here use anti-slash but here we use slash
as usually under in arguments given to dar
windows to point
the command
Under Windows, which directory corresponds to /
When running dar from a windows command-line (thus not from cygwin environement), dar's root directory
is the parent directory of the one holding the dar.exe file. This does not mean that you cannot have dar
backing up anything outside this directory (you can thanks to the /cygdrive/...
path alias seen above),
but when dar looks for darrc
it looks using this parent directory as the "/" root one.
Since release 2.6.14, the published dar packages for Windows are configured and built
in such a way that dar.exe uses the provided darrc
file located in the etc
sub-directory. So darrc
is now usable, out of the box.
However if you rename the directory where dar.exe is located, which name is something like
dar64-x.y.z-win64
, the dar.exe binary will still look for a darrc
at
/dar64-x.y.z-win64/etc/darrc
, taking as root directory the parent directory
of the directory where it resides. You can still then explicitely rely on it by mean of a -B option
pointing to the modified path where the darrc
is located.
when using the "lzo" compression algorithm, dar/libdar always uses the algorithm lzo1x_999 with the compression level requested (from 1 to 9) as argument. Dar thus provides 9 different compression/speed levels with lzo.
In the other hand, as of today (2017) lzop, the command line tool, uses the very degradated lzo algorithm known as lzo1x_1_15 for level 1 and the intermediate lzo1x_1 algorithm for levels from 2 to 6, which makes levels 2 to 6 totally equivalent from the lzop program point of view. Last, compression levels 7 to 9 for lzop uses the same lzo1x_999 algorithm as what dar/libdar uses, which is the only algorithm of the lzo family that makes use of a compression levels. In total lzop only provides 5 different compression levels/algorithms only.
So now, you know why dar is slower than lzop when using lzo compression at level 1 to 6.
To get to equivalent feature as lzop provides for level 1 and 2-6, dar/libdar
provides two additional lzo-based compression algorithms: lzop-1
and lzop-3
. As
you guess, lzop-1
uses the lzo1x_1_15 algorithm as lzop does for
its compression level 1, and lzop-3
uses the lzo1x_1 algorithm as lzop does
for its compression levels 2 to 6. For both lzop-1 and lzop-3 algorithms,
the compression level is not used, you can keep the default or change its value this
will not change dar behavior.
compression level for lzop |
algorithm for dar | compression level for dar |
lzo algorith used |
---|---|---|---|
1 | lzop-1 |
- |
lzo1x_1_15 |
2 |
lzop-3 |
- |
lzo1x_1 |
3 |
lzop-3 | - |
lzo1x_1 |
4 |
lzop-3 | - |
lzo1x_1 |
5 |
lzop-3 | - |
lzo1x_1 |
6 |
lzop-3 | - |
lzo1x_1 |
- |
lzo |
1 |
lzo1x_999 |
- |
lzo |
2 |
lzo1x_999 |
- |
lzo |
3 |
lzo1x_999 |
- |
lzo |
4 |
lzo1x_999 |
- |
lzo |
5 |
lzo1x_999 |
- |
lzo |
6 |
lzo1x_999 |
7 |
lzo |
7 |
lzo1x_999 |
8 |
lzo |
8 |
lzo1x_999 |
9 |
lzo |
9 |
lzo1x_999 |
What is libthreadar and why libdar relies on it?
libthreadar is a wrapping library of the Posix C threads. It was originally part of webdar a libdar based web server project, but as this code became necessary also inside libdar, all this thread relative classes have been put into a separated library called libthreadar, that today both webdar and libdar rely upon.
dar/libdar rely on libthreadar to manage several threads inside libdar, which is necessary to efficiently implement the remote repository feature based on libcurl (available starting release 2.6.0).
Why not using boost library or the thread suppport brought by C++11?
Because first no complier implemented C++11 at the time webdar was started and second boost thread was not found to be adapted to the need for the following reasons:- I wanted a more object oriented approach than passing a function to be ran into a separated thread as provided by boost/C++11 interface, where from the pure virtual class libthreadar::thread that let you create inherited class from.
- I wanted to avoid functions/methods with multiple parameters, as it has shown in the past with libdar to be a source of problem when it comes to backward compatibily while adding new features. Instead the inherited class can provide as many different methods to setup individual parameters before the thread is run()
- As a consequence, another need was to be able to set an object before the thread is effectively run, the C++ object existence need not to match the thread existence, in other words the object shall be created first and the thread run() afterward. Of course the destruction of a thread object would kill the thread it is wrapping. The other advantage doing that way was the possibility to re-run() a thread from the same object once a first thread had completed eventually modifying some parameters through the method provided by the inherited class from libthreadar::thread
- Last but not least, I wanted to have an exception thrown from within a thread and not caught up to the global thread function (thus leading the thread to end), to be kept over the thread existance and relaunched into the thread calling the join() method for that object. Thus avoiding having a coherent treatment of errors using C++ exception when thread were used.
libthreadar does all this and is a completely independant piece of software from both webdar and dar/libdar. So you can use it freely (LGPLv3 licensing) if you want. As all project I've been published, it is documented as much as possible, feedback is always welcome of something is not clear, wrong or missing.
libthreadar source code can be found here, documentation is available in source package as well as online here
I have sftp pubkey authentication working with ssh/sftp, how to have dar using too this public key authentication for sftp?
The answer is as simply as adding the following option while calling dar: -afile-auth
Why not doing pubkey by default and falling back to password authentication?
First this is by choice, because -afile-auth also uses ~/.netrc even when using sftp. Second it could be possible to first try public key authentication and falling back to password authentication, but it would require libdar to first connect, eventually failing if pubkey was not provisionned or wrong then connecting again asking user for password on command line. I seems more efficient doing else: file authentication when user ask to to so, password authentication else. The counterpart is not huge for user (you can add -afile-auth in your ~/.darrc and forget about it).
I Cannot get dar to connect to remote server using SFTP, it fails withSSL peer certificate or SSH remote key was not OK
This may be due to several well known reasons:
- dar/libdar cannot find the known_hosts file
- if using key authentifcation instead of password, dar/libdar cannot find the private key file
- if using key authentifcation instead of password, dar/libdar cannot find the public key file
- You have an outdate version of libssh2 or libcurl library and lack support for ecdsa host keys
How to workaround?
For the three first cases, you can make use of environment variable to change the default behavior:
DAR_SFTP_KNOWNHOSTS_FILE
DAR_SFTP_PUBLIC_KEYFILE
DAR_SFTP_PRIVATE_KEYFILE
They respectively default to:
$HOME/.ssh/known_hosts
$HOME/.ssh/id_rsa.pub
$HOME/.ssh/id_rsa
Changing them accordingly to your need is done before running dar from the shell, for example if you use sh or bash:
export DAR_SFTP_KNOWNHOSTS_FILE=~/.ssh/known_hosts_alternative
# then use dar as expected
dar -c sftp://....
dar -t sftp://...
if you use csh or tcsh:
setenv DAR_SFTP_KNOWNHOSTS_FILE ~/.ssh/known_hosts_alternative
# then use dar as expected
dar -c sftp://...
dar -t sftp://...
For the fourth and last case, the thing is more tricky:
First, if you don't already know what the known_hosts file is used for:
- It is used by ssh/sftp to validate that the host you connect to is not a pirate host trying to put itself between you and the real sftp/ssh server you intend to connect to. Usually the first time you connect to an sftp/ssh server you need to validate the fingerprint of the key received from the server (checking by another mean like phone call to the server's admin, https web browsing to the server page, and so on). When you validate the host key the first time, this adds a new line in known_hosts file in order for ssh/sftp client to automatically check the next time you connect that the host is still the correct one.
The known_hosts file is usually located in your home directory at
~/.ssh/known_hosts
and looks like this:
asteroide.lan ecdsa-sha2-nistp256 AAAAE2V...
esxi,192.168.5.20 ssh-rsa AAAAB3N...
192.168.6.253 ssh-rsa AAAAB3N...
Each line concerns a different sftp/ssh server and contains three fields
<hostame or IP>
- this is the server we have already connected to
<host-key type>
- this is the type of key
<key>
- this is the public key the server has sent the first time we connected
We will focus on the second field.
dar/libdar relies on libcurl for networking protocol interaction, which in turn relies on libssh2. Before libssh2 1.9.0 only rsa host key were supported leading to this message as soon as the known_hosts file contained a non-rsa host key (even another host listed in the known_hosts file than the one we tend to connect). As of December 2020, if 1.9.0 has now support for addition host key types (ecdsa and ed25519) libcurl does not yet leverage this support and the problem persists. I'm confident that things will be updated soon for this problem to be solved in a few months.
In the meantime, several options are available to workaround that limitation:
-
disable known_hosts checking, by setting the environment
variable
DAR_SFTP_KNOWNHOSTS_FILE
to an empty string. Libdar will then not ask libcurl/libssh2 to check for known hosts validity, but this is not a recommend option! because it opens the door to man-in-the-middle attacks. -
copy the known_host file to
~/.ssh/known_host_for_libssh2
and remove from this copy all the lines corresponding to host keys that are not supported by libssh2, then set theDAR_SFTP_KNOWNHOSTS_FILE
variable to that new file. This workaround is OK only if the non supported host key are not the one you intend to have dar communcating with... - replace the the host key of the ssh/sftp server by an ssh-rsa one, OK, this will most probably imply you to have root permission on the remote ssh/sftp server... which is not possible when using public cloud service over Internet.
Cannot open catalogue: Cannot handle such a too large integer.
What to do?
Unless using dar/libdar built in 32 bits mode, you should not meet this
error message from dar unless exceeding the 64 bits
integer limits.
To know which intergers type dar relies on (infinint, 32 bits or 64
bits) run dar -V and check the line Integer size used
:
# src/dar_suite/dar -V
dar version 2.7.0_dev, Copyright (C) 2002-2020 Denis Corbin
Long options support : YES
Using libdar 6.3.0 built with compilation time options:
gzip compression (libz) : YES
bzip2 compression (libbzip2) : YES
lzo compression (liblzo2) : YES
xz compression (liblzma) : YES
zstd compression (libzstd) : YES
lz4 compression (liblz4) : YES
Strong encryption (libgcrypt): YES
Public key ciphers (gpgme) : YES
Extended Attributes support : YES
Large files support (> 2GB) : YES
ext2fs NODUMP flag support : YES
Integer size used : 64 bits
Thread safe support : YES
Furtive read mode support : YES
Linux ext2/3/4 FSA support : YES
Mac OS X HFS+ FSA support : NO
Linux statx() support : YES
Detected system/CPU endian : little
Posix fadvise support : YES
Large dir. speed optimi. : YES
Timestamp read accuracy : 1 nanosecond
Timestamp write accuracy : 1 nanosecond
Restores dates of symlinks : YES
Multiple threads (libthreads): YES (1.3.1)
Delta compression (librsync) : YES
Remote repository (libcurl) : YES
argon2 hashing (libargon2) : YES
compiled the Jan 7 2021 with GNUC version 8.3.0
dar is part of the Disk Backup suite (Release 2.7.0_dev)
dar comes with ABSOLUTELY NO WARRANTY; for details
type `dar -W'. This is free software, and you are welcome
to redistribute it under certain conditions; type `dar -L | more'
for details.
If you read "infinint" and see the above error message from dar, thanks to report a bug this should never occur. Else the problem appear when using dar before release 2.5.13 either at backup creation time when dar met a file with a negative date, or at backup reading time, reading a backup generated by dar 2.4.x or older and containing a file with a very distant date in the future thing dar 2.4.x and below recorded when the system returned a negative date for a file to save.
What is a negative date? Date of files are recorded un "unix" time, that's to say the number of second elapsed since the beginning of year 1970. A negative date is means a date before 1970, which should normally not be met today because the few computer that existed at that time had not such way of storing dates nor the same files and filesystems.
However for some reasons such negative dates can be set returned by several operating systems (Linux based ones among others) and dar today has not the ability to record such dates (but if you need dar storing negative dates for a good reason please fill a feature request with the reason you need this feature).
Since release 2.5.13 when dar the system reports a negative date for a file to save, dar asks the user to consider the date was zero, this requires user interaction and may not fit all needs. For that reason, the -az option has been added to automatically assume negative dates read from filesystem to be equal to zero (January 1st 1970, 00:00 GMT) without user interaction.
I have a diff/incremental backup and I want to convert it to a full backup, how to do that?it is possible to convert a differential backup if you also have the full backup is has been based on, in other words: the backup of reference. This is pretty simple to do:
dar -+ new_full_backup -A backup_of_reference -@ differential_backup full-from-diff [other options]
- new_full_backup
- is a backup that will be created according the provided other options (compression, encryption, slicing, hashing and so on as specified on arguments).
- backup_of_reference
- is the full backup that was used as reference for the differential backup
- differential_backup
- is the differential backup you want to convert into a full backup
the important point is the last argument "full-from-diff"
which is defined in
/etc/darrc and makes the merging operation used here (-+ option) working as expected for
the resulting backup be the same as if a full backup had been done instead of a differential
backup at the time "differential_backup" was created.
For incremental backups, (backup which reference is not a full backup) you can also use this method but you first need to create the full backup from the incremental/differential backup that has been used as reference for this incremental backup. Thus the process should follow the same order used to create backups.
How to use dar with tapes (like LTO tapes)?
dar
(Disk Archive) was designed to replace tar
(Tape archive) to leverage
the direct access brought by disks, something tar
was not able to use. A tape by nature does not
allow to jump to a given position (or at least, it is so inefficient to skip back and forth,
that this is barely used). That said, dar
has also evolved to replace tar
when
it comes to use tapes (like LTO tapes) as backup media. The advantage of dar
here is the
integrated ciphering, efficient compression (no need to compress already compressed files), resiliency,
redundancy and CRC data protection for the most interesting features.
Backup operation
dar
can produce a backup on its stdout, which can be piped or redirected to a tape device.
That's easy:
dar -c - (other options)... > /dev/tape
dar -c - (other options) | some_command > /dev/tape
Thing get more complicated when the backup exceeds the size of a single tape. For that reason
dar_split
has been added to the suite of dar programs. Its purpose it to
receive the backup produce by dar on its standard input and write it to a given file
up to the time the write fails due to lack of space. At that time, it records what
was written and what still remains to be written down, close the descriptor for the target file,
display a message to the user and waits for the user to hit enter. Then it reopens the file and
continues writing the pending
data to that target file. The user is expected to have made the necessary for further writing to
this same file (or special device) to work, for example by replacing the tape by a new one
rewound at its beginning, tape that will be overwritten by the continuation of the dar
backup:
dar -c - (other options)... | dar_split split_output /dev/tape
Testing operation
Assuming you have completed your backup over three tapes, you should now be concerned by testing the backup:
dar_split split_input /de/tape | dar -t - --sequential-read
Before running the previous call, you should have rewound all your tapes at the offset
they had when you used them to write the dar backup (their beginning, most of the time).
The first tape should have been inserted in the drive ready for reading. dar
nor dar_split
know about the location of the data on tape, they will not
seek the tape forth or backward, they will just sequentially read (or write depending on
the requested operation).
when dar_split
readings will reach the end of the tape, the process pause and let you swap
the tape with the following one. You can also take the time to rewind the tape before swapping it,
if you want. Once
the next tape is ready in the drive and set at the properly offset, just hit enter in the
dar_split
terminal for the process to continue.
At the end of the testing dar
will report the backup status (hopefully the
backup test will succeed) but dar_split
does not know anything about that and still
continues to try providing data to dar, so you will have to hit CTRL-C to stop it.
to avoid stopping dar_split
by hand, you can indicate to dar_split
the number of tapes used for the backup, by mean of -s
option. If after the last tape at backup
time you wrote an EOF tape mark mt -f /dev/tape weof
then dar_split
will stop by itself after that number of tape. In our example, the backup expanded over three tapes,
where from the -c 3
option:
dar_split -c 3 split_input /dev/tape | dar -t - --sequential-read
Listing operation
Listing operation can be done the same way as the testing operation seen above, just replacing
-t
by -l
:
dar_split split_input /dev/tape | dar -l - --sequential-read
But what a pity not to use
the isolated catalogue feature! Catalogue isolation let you keep on disk (not on tape) a
small file containing the table of content of the backup. Such small backup can be used as
backup of the internal catalogue of the backup (which resided on tape) to recovery corruption
of that part of the backup (this gives an additional level of protection for backup metadata).
It can also be used for backup content listing, it can be provided to dar_manager
and
most interesting can be used as reference for incremental or differential backups in place of reading the
reference backup content from tapes.
Assuming you did not created an isolated catalogue at the time of the backup, let's do it once the backup has been written to tape:
dar_split split_input /dev/tape | dar -A - --sequential-read -C isolated -z
This will lead dar
to read the whole backup. Thus, it is more efficient to create it "on-fly",
which means during the backup creation process, in order to avoid this additional reading operation:
dar -c - --on-fly-isolate isolated (other options)... | dar_split split_output /dev/tape
You will get a small isolated.1.dar
file (you can replace isolated after -C
or --on-fly-isolate
options, by a more meaningful name of course),
file located in the current directory by default, while your backup will be sent to tapes, as already seen earlier.
The isolated catalogue can now be used in place of the backup on tapes, the process becomes much much faster for listing the backup content:
dar -l isolated (other options like filters)...
Restoration operation
You can perform a restoration the same way we did the backup testing above, just replacing -t
by -x
:
dar_split split_input /dev/tape | dar -x - --sequential-read (other options like --fs-root and so on)
But better
leverage an isolated catalogue, in particular if you only plan to restore a few files. Without
isolated catalogue dar will have to read the whole backup up to its end (the same as tar does but
for other reasons)
to reach the internal catalogue that contains additional information (like files that have been
remove since backup of reference was made). Using an isolated catalogue avoids that and let dar
to stop reading earlier, that's to say, once the last file to restore will have been reached in the backup.
So if this file is located near the beginning of the backup, you can save a lot of time using an isolated catalogue!
dar_split split_input /dev/tape | dar -x - -A isolated --sequential-read (other options, like --fs-root and so on)
Rate limiting
It is sometime necessary to rate limit the output from and to tapes. dar_split
has a -r
option for that purpose:
dar_split -r 10240000 split_input /dev/tape | dar ...
dar ... | dar_split -r 20480000 split_output /dev/tape
Argument to -r
option is expected in bytes per second.
Block size
Some tape device do not behave well if the data requested or sent to them uses large block of data
at once. Usually the operating system knows about that and split application provided data in smaller
blocks, if necessary. Sometimes this is not the case, where from
the -b
option that receives the maximum block size in bytes that dar_split
will use.
It does not matter whether the block size used when writing is different from the one use at reading time, both must
just not exceed the block size supported by the tape device:
dar_split -b 1048576 split_input /dev/tape | dar ...
dar ... | dar_split -b 1048576 split_output /dev/tape
Differential and incremental backups
Differential and incremental backups are built the same way: providing the backup of reference
at the time of the backup creation, by mean of dar's -A
option. One could use
dar_split
twice for that: once to read the backup of reference from a set of tapes, operation
that precedes the backup itself, then a second dar_split
command to send the backup to tapes...
The problem is that the second backup will open the tape device for writing while it has first
to be open for reading by the first dar_split
command, in order to fetch the backup of reference.
Thus, in this context, we have no choice (unless we have two tape drives): we must rely on an isolated catalogue of the backup of reference:
dar -c - -A isolated_cat_of_ref_backup (other options)... | dar_split split_output /dev/tape
dar_split and tar
dar_split
is by design a separated command from dar
. You can thus use it with
any other command than dar
, in particular, yes, you can use it with tar
if you don't want to rely on the additional features and resiliency dar
provides.
Since around year 2010, this is a question/suggestion/remark/revew that haunted the dar-support mailing-list and new feature requests, resurrecting from time to time: Why dar does not compress small files together in dar archive for better compression, like tar does? (its grand and venerable brother).
First point to note: tar does not compress at all. This is gzip, bzip2, xz or other similar programs that take as unstructured input what tar outputs, in order to produce an unstructured compressed data stream redirected into a file.
It would be tempting to answer: "You can do the same with dar!", but there are better things to do, read below.
But before let's remind dar's design and objectives:
- compression is done per file
- a given file's data can be accessed directly
Doing so, has several advantages:
- In a given backup/archive, you can avoid compressing some file, while compressing others (gain of time and space, as compressing already compressed file usually leads to waste storage space).
- You can quickly restore a particular file, even from a several petabytes archive/backup, no need to read (disk IO) and decompress (CPU cycles) all the data present before that file in the archive.
- Your backups are more robust: if even just one byte data corruption occurred at some place in one of your backup, it will concern only one file, but you will be able to restore all other files, even those located after that corruption. At the opposite, with tar's compression manner, you would lose all data following the data corruption...
dar is doing that way, because tar's way was not addressing some major concerns in the backup area. Yes, this has the drawback to degrade the compression ratio, but this is a design choice.
Now, looking for the best of both approaches, some proposed to gather small files together and compress them together. This would not only break all the three advantages exposed above, but also break another feature which is the order in which files are stored: Dar does not inspect twice the same directory at backup time nor at restoration time. Doing so avoids saving the full path of each directory and file (and at two places: in-line metadata and in the catalog). This also leads to better performances as it better leverage disk cache for metadata (directory content). OK, one could say that today with SSD and NVMe this is negligible, but one would ignore that direct RAM access from cache, is still much faster than any NVMe disk access.
So, if you can't afford keeping small files uncompressed (see dar's --mincompr, -X and -I options for example), or if compressing them with dar versus what tar does makes a so big difference that it worth considering to compress them together, you have three options:
-
use tar in dar
- make a tar archive of the many small files you have, just a tar file, without compression. Note: you can automate this when entering some particular directory trees of your choices by mean of -< -> and -= options, and remove those temporary tar file when dar exit those directories at backup time. You would also have to exclude those files used to build the tar file you created dynamically (see -g/-P/-X/-I/-[/-] options).
- Then let dar perform the backup, compressing those tar files with other files, if they satisfy the --mincompr size, or any other filtering of you choice (see -Z and -Y options). Doing so can let you leverage parallel compression and reduced execution time, brought by dar, something you cannot have with tar alone.
- Of course, you benefit also of all other dar's features (slicing, ciphering, slice hashing in fly, isolated catalogues, differential/incremental/decremental backups... and even delta binary!)
But yes, you will lose dar's three advantages seen above, but just for those small files you have gathered in a tar in dar file, not for the rest of what's under backup.
-
use tar alone
If dar does not match your need and/or if you do not need to leverage any of the three dar's advantages seen above, tar is probably a better choice for you. That's a pity, but there is not one tool that matches all needs...
-
describe with details a new implementation/enhancement
The proposal should take into account dar's design objectives (robustness to data corruption, efficient directory seeking, fast access to any file's data) in a way or another.
But please, do not make an imprecised proposal, that assumes it will just "magically" work: I only like magic when I go to a magic show ;)
Thanks to detail both backup and restoration processes. Often times, pulling out the missing details one after the other, results in something unfeasible or with unexpected complexity and/or much less gain than expected. Also look at the Dar Archive Structure to see how it could fit or if not, what part should be redesigned and how.