Merging upstream version 1.20.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
72bcf08df5
commit
e24aefbbb2
31 changed files with 1242 additions and 685 deletions
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 10 April 2017
|
||||
@set VERSION 1.19
|
||||
@set UPDATED 12 February 2018
|
||||
@set VERSION 1.20
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -38,7 +38,7 @@ This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}).
|
|||
* Introduction:: Purpose and features of lziprecover
|
||||
* Invoking lziprecover:: Command line interface
|
||||
* Data safety:: Protecting data from accidental loss
|
||||
* Repairing files:: Fixing bit-flip and similar errors
|
||||
* Repairing files:: Fixing bit flips and similar errors
|
||||
* Merging files:: Fixing several damaged copies
|
||||
* File names:: Names of the files produced by lziprecover
|
||||
* File format:: Detailed format of the compressed file
|
||||
|
@ -50,7 +50,7 @@ This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2009-2017 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2009-2018 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission
|
||||
to copy, distribute and modify it.
|
||||
|
@ -79,7 +79,7 @@ availability:
|
|||
@itemize @bullet
|
||||
@item
|
||||
The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors (one
|
||||
recovery means. The lziprecover program can repair bit flip errors (one
|
||||
of the most common forms of data corruption) in lzip files, and provides
|
||||
data recovery capabilities, including error-checked merging of damaged
|
||||
copies of a file. @xref{Data safety}.
|
||||
|
@ -111,8 +111,8 @@ the compressors in the lzip family; lzip, plzip, minilzip/lzlib, clzip
|
|||
and pdlzip.
|
||||
|
||||
If the cause of file corruption is damaged media, the combination
|
||||
@w{GNU ddrescue + lziprecover} is the best option for recovering data
|
||||
from multiple damaged copies. @xref{ddrescue-example}, for an example.
|
||||
@w{GNU ddrescue + lziprecover} is the best option for recovering data from
|
||||
multiple damaged copies. @xref{ddrescue-example}, for an example.
|
||||
|
||||
If a file is too damaged for lziprecover to repair it, all the
|
||||
recoverable data in all members of the file can be extracted with the
|
||||
|
@ -139,6 +139,9 @@ undergone the process of decompression.
|
|||
@node Invoking lziprecover
|
||||
@chapter Invoking lziprecover
|
||||
@cindex invoking
|
||||
@cindex options
|
||||
@cindex usage
|
||||
@cindex version
|
||||
|
||||
The format for running lziprecover is:
|
||||
|
||||
|
@ -151,7 +154,7 @@ When decompressing or testing, @samp{-} used as a @var{file} argument
|
|||
means standard input. It can be mixed with other @var{files} and is read
|
||||
just once, the first time it appears in the command line.
|
||||
|
||||
Lziprecover supports the following options:
|
||||
lziprecover supports the following options:
|
||||
|
||||
@table @code
|
||||
@item -h
|
||||
|
@ -191,25 +194,25 @@ lzma-alone file as follows:
|
|||
@itemx --stdout
|
||||
Write decompressed data to standard output; keep input files unchanged.
|
||||
This option is needed when reading from a named pipe (fifo) or from a
|
||||
device. Use it also to recover as much of the uncompressed data as
|
||||
device. Use it also to recover as much of the decompressed data as
|
||||
possible when decompressing a corrupt file.
|
||||
|
||||
@item -d
|
||||
@itemx --decompress
|
||||
Decompress the specified file(s). If a file does not exist or can't be
|
||||
opened, lziprecover continues decompressing the rest of the files. If a
|
||||
file fails to decompress, lziprecover exits immediately without
|
||||
Decompress the specified files. If a file does not exist or can't be
|
||||
opened, lziprecover continues decompressing the rest of the files. If a file
|
||||
fails to decompress, or is a terminal, lziprecover exits immediately without
|
||||
decompressing the rest of the files.
|
||||
|
||||
@item -D @var{range}
|
||||
@itemx --range-decompress=@var{range}
|
||||
Decompress only a range of bytes starting at decompressed byte position
|
||||
@samp{@var{begin}} and up to byte position @w{@samp{@var{end} - 1}}.
|
||||
This option provides random access to the data in multimember files; it
|
||||
only decompresses the members containing the desired data. In order to
|
||||
guarantee the correctness of the data produced, all members containing
|
||||
any part of the desired data are decompressed and their integrity is
|
||||
verified.
|
||||
Byte positions start at 0. This option provides random access to the
|
||||
data in multimember files; it only decompresses the members containing
|
||||
the desired data. In order to guarantee the correctness of the data
|
||||
produced, all members containing any part of the desired data are
|
||||
decompressed and their integrity is verified.
|
||||
|
||||
Four formats of @var{range} are recognized, @samp{@var{begin}},
|
||||
@samp{@var{begin}-@var{end}}, @samp{@var{begin},@var{size}}, and
|
||||
|
@ -237,7 +240,7 @@ Keep (don't delete) input files during decompression.
|
|||
@item -l
|
||||
@itemx --list
|
||||
Print the uncompressed size, compressed size and percentage saved of the
|
||||
specified file(s). Trailing data are ignored. The values produced are
|
||||
specified files. Trailing data are ignored. The values produced are
|
||||
correct even for multimember files. If more than one file is given, a
|
||||
final line containing the cumulative sizes is printed. With @samp{-v},
|
||||
the dictionary size, the number of members in the file, and the amount
|
||||
|
@ -297,11 +300,13 @@ on the number of members in @samp{@var{file}}.
|
|||
|
||||
@item -t
|
||||
@itemx --test
|
||||
Check integrity of the specified file(s), but don't decompress them.
|
||||
This really performs a trial decompression and throws away the result.
|
||||
Use it together with @samp{-v} to see information about the file(s). If
|
||||
a file fails the test, does not exist, can't be opened, or is a
|
||||
terminal, lziprecover continues checking the rest of the files.
|
||||
Check integrity of the specified files, but don't decompress them. This
|
||||
really performs a trial decompression and throws away the result. Use it
|
||||
together with @samp{-v} to see information about the files. If a file
|
||||
fails the test, does not exist, can't be opened, or is a terminal, lziprecover
|
||||
continues checking the rest of the files. A final diagnostic is shown at
|
||||
verbosity level 1 or higher if any file fails the test when testing
|
||||
multiple files.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
|
@ -311,9 +316,43 @@ verbosity level, showing status, compression ratio, dictionary size,
|
|||
trailer contents (CRC, data size, member size), and up to 6 bytes of
|
||||
trailing data (if any) both in hexadecimal and as a string of printable
|
||||
ASCII characters.@*
|
||||
Two or more @samp{-v} options show the progress of decompression.@*
|
||||
In other modes, increasing verbosity levels show final status, progress
|
||||
of operations, and extra information (for example, the failed areas).
|
||||
|
||||
@item --loose-trailing
|
||||
When decompressing, testing or listing, allow trailing data whose first
|
||||
bytes are so similar to the magic bytes of a lzip header that they can
|
||||
be confused with a corrupt header. Use this option if a file triggers a
|
||||
"corrupt header" error and the cause is not indeed a corrupt header.
|
||||
|
||||
@item --dump-tdata
|
||||
Dump the trailing data (if any) of one or more regular files to standard
|
||||
output, or to a file if the @samp{--output} option is used. If more than
|
||||
one file is given, the trailing data of all files are concatenated. If a
|
||||
file does not exist, can't be opened, or is not regular, lziprecover
|
||||
continues processing the rest of the files. If the dump fails in one
|
||||
file, lziprecover exits immediately without processing the rest of the
|
||||
files.
|
||||
|
||||
@item --remove-tdata
|
||||
Remove the trailing data from regular files in place. The date of each
|
||||
file is preserved if possible. If the removal fails in one file,
|
||||
lziprecover continues processing the rest of the files. This option may
|
||||
be dangerous if the file is corrupt or if the trailing data contain a
|
||||
forbidden combination of characters. @xref{Trailing data}. Verify that
|
||||
@w{@samp{lzip -cd file.lz | wc -c}} and the uncompressed size shown by
|
||||
@w{@samp{lzip -l file.lz}} match before attempting the removal.
|
||||
|
||||
@item --strip-tdata
|
||||
Copy one or more regular files to standard output (or to a file if the
|
||||
@samp{--output} option is used), stripping the trailing data (if any)
|
||||
from each file. If more than one file is given, the files are
|
||||
concatenated. If a file does not exist, can't be opened, or is not
|
||||
regular, lziprecover continues processing the rest of the files. If a
|
||||
file fails to copy, lziprecover exits immediately without processing the
|
||||
rest of the files.
|
||||
|
||||
@end table
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
|
@ -365,12 +404,12 @@ compressed it, and stored two copies on separate media. Years later you
|
|||
notice that both copies are corrupt.
|
||||
|
||||
If you compressed with gzip and both copies suffer any damage in the
|
||||
data stream, even if it is just one altered bit, the original data can't
|
||||
be recovered.
|
||||
data stream, even if it is just one altered bit, the original data can
|
||||
only be recovered by an expert, if at all.
|
||||
|
||||
If you used bzip2, and if the file is large enough to contain more than
|
||||
one compressed data block (usually larger than 900 kB uncompressed), and
|
||||
if no block is damaged in both files, then the data can be manually
|
||||
one compressed data block (usually larger than @w{900 kB} uncompressed),
|
||||
and if no block is damaged in both files, then the data can be manually
|
||||
recovered by splitting the files with bzip2recover, verifying every
|
||||
block and then copying the right blocks in the right order into another
|
||||
file.
|
||||
|
@ -391,7 +430,7 @@ Lziprecover can repair perfectly most files with small errors (up to one
|
|||
single-byte error per member), without the need of any extra redundance
|
||||
at all. If the reparation is successful, the repaired file will be
|
||||
identical bit for bit to the original. This makes lzip files resistant
|
||||
to bit-flip, one of the most common forms of data corruption.
|
||||
to bit flip, one of the most common forms of data corruption.
|
||||
|
||||
The error may be located anywhere in the file except in the first 5
|
||||
bytes of each member header or in the @samp{Member size} field of the
|
||||
|
@ -400,9 +439,9 @@ can be easily repaired with a text editor like GNU Moe (@pxref{File
|
|||
format}). If the error is in the member size, it is enough to ignore the
|
||||
message about @samp{bad member size} when decompressing.
|
||||
|
||||
Bit-flip happens when one bit in the file is changed from 0 to 1 or vice
|
||||
Bit flip happens when one bit in the file is changed from 0 to 1 or vice
|
||||
versa. It may be caused by bad RAM or even by natural radiation. I have
|
||||
seen a case of bit-flip in a file stored on an USB flash drive.
|
||||
seen a case of bit flip in a file stored on an USB flash drive.
|
||||
|
||||
One byte may seem small, but most file corruptions not produced by
|
||||
transmission errors or I/O errors just affect one byte, or even one bit,
|
||||
|
@ -463,7 +502,7 @@ into clusters and then merging the files as if each cluster were a
|
|||
single error.
|
||||
|
||||
Here is a real case of successful merging. Two copies of the file
|
||||
@samp{icecat-3.5.3-x86.tar.lz} (compressed size 9 MB) became corrupt
|
||||
@samp{icecat-3.5.3-x86.tar.lz} (compressed size @w{9 MB}) became corrupt
|
||||
while stored on the same NAND flash device. One of the copies had 76
|
||||
single-bit errors scattered in an area of 1020 bytes, and the other had
|
||||
3028 such errors in an area of 31729 bytes. Lziprecover produced a
|
||||
|
@ -592,9 +631,10 @@ padding zero bytes to a lzip file.
|
|||
@item
|
||||
Useful data added by the user; a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount of
|
||||
text to a lzip file as long as the text does not begin with the string
|
||||
"LZIP", and does not contain any zero bytes (null characters). Nonzero
|
||||
bytes and zero bytes can't be safely mixed in trailing data.
|
||||
text to a lzip file as long as none of the first four bytes of the text
|
||||
match the corresponding byte in the string "LZIP", and the text does not
|
||||
contain any zero bytes (null characters). Nonzero bytes and zero bytes
|
||||
can't be safely mixed in trailing data.
|
||||
|
||||
@item
|
||||
Garbage added by some not totally successful copy operation.
|
||||
|
@ -604,12 +644,16 @@ Malicious data added to the file in order to make its total size and
|
|||
hash value (for a chosen hash) coincide with those of another file.
|
||||
|
||||
@item
|
||||
In very rare cases, trailing data could be the corrupt header of another
|
||||
In rare cases, trailing data could be the corrupt header of another
|
||||
member. In multimember or concatenated files the probability of
|
||||
corruption happening in the magic bytes is 5 times smaller than the
|
||||
probability of getting a false positive caused by the corruption of the
|
||||
integrity information itself. Therefore it can be considered to be below
|
||||
the noise level.
|
||||
the noise level. Additionally, the test used by lziprecover to discriminate
|
||||
trailing data from a corrupt header has a Hamming distance (HD) of 3,
|
||||
and the 3 bit flips must happen in different magic bytes for the test to
|
||||
fail. In any case, the option @samp{--trailing-error} guarantees that
|
||||
any corrupt header will be detected.
|
||||
@end itemize
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
|
@ -621,6 +665,35 @@ that of user-added data, they are expected to be ignored. In those cases
|
|||
where a file containing trailing data must be rejected, the option
|
||||
@samp{--trailing-error} can be used. @xref{--trailing-error}.
|
||||
|
||||
Lziprecover facilitates the management of metadata stored as trailing
|
||||
data in lzip files. See the following examples:
|
||||
|
||||
@noindent
|
||||
Example 1: Add a comment or description to a compressed file.
|
||||
|
||||
@example
|
||||
# First append the comment as trailing data to a lzip file
|
||||
echo 'This file contains this and that' >> file.lz
|
||||
# This command prints the comment to standard output
|
||||
lziprecover --dump-tdata file.lz
|
||||
# This command outputs file.lz without the comment
|
||||
lziprecover --strip-tdata file.lz
|
||||
# This command removes the comment from file.lz
|
||||
lziprecover --remove-tdata file.lz
|
||||
@end example
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 2: Add and verify a cryptographically secure hash. (This may be
|
||||
convenient, but a separate copy of the hash must be kept in a safe place
|
||||
to guarantee that both file and hash have not been maliciously replaced).
|
||||
|
||||
@example
|
||||
sha256sum < file.lz >> file.lz
|
||||
lziprecover --strip-tdata file.lz | sha256sum -c \
|
||||
<(lziprecover --dump-tdata file.lz)
|
||||
@end example
|
||||
|
||||
|
||||
@node Examples
|
||||
@chapter A small tutorial with examples
|
||||
|
@ -658,7 +731,7 @@ Do this instead
|
|||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 4: Decompress @samp{file.lz} partially until 10 KiB of
|
||||
Example 4: Decompress @samp{file.lz} partially until @w{10 KiB} of
|
||||
decompressed data are produced.
|
||||
|
||||
@example
|
||||
|
@ -756,7 +829,9 @@ lziprecover source directory to build it.
|
|||
|
||||
By default, unzcrash reads the specified file and then repeatedly
|
||||
decompresses it, increasing 256 times each byte of the compressed data,
|
||||
so as to test all possible one-byte errors.
|
||||
so as to test all possible one-byte errors. Note that it may take years
|
||||
or even centuries to test all possible one-byte errors in a large file
|
||||
(tens of MB).
|
||||
|
||||
If the @code{--block} option is given, unzcrash reads the specified file
|
||||
and then repeatedly decompresses it, setting all bytes in each
|
||||
|
@ -801,10 +876,10 @@ See
|
|||
The format for running unzcrash is:
|
||||
|
||||
@example
|
||||
unzcrash [@var{options}] "lzip -tv" @var{filename}.lz
|
||||
unzcrash [@var{options}] 'lzip -t' @var{file}.lz
|
||||
@end example
|
||||
|
||||
Unzcrash supports the following options:
|
||||
unzcrash supports the following options:
|
||||
|
||||
@table @code
|
||||
@item -h
|
||||
|
@ -835,24 +910,34 @@ The number of N-bit errors per byte (N = 1 to 8) is:
|
|||
|
||||
@item -B[@var{size}][,@var{value}]
|
||||
@itemx --block[=@var{size}][,@var{value}]
|
||||
Test block errors of given @var{size} aligned to a @var{size}-byte
|
||||
boundary, simulating a whole sector I/O error. Block @var{size} defaults
|
||||
to 512 bytes. @var{value} defaults to 0.
|
||||
Test block errors of given @var{size}, simulating a whole sector I/O
|
||||
error. Block @var{size} defaults to 512 bytes. @var{value} defaults to
|
||||
0. By default, only blocks aligned to a @var{size}-byte boundary are
|
||||
tested, but this may be changed with the @code{--delta} option.
|
||||
|
||||
@item -d @var{n}
|
||||
@itemx --delta=@var{n}
|
||||
Test only one of every @var{n} bytes, blocks or truncation sizes,
|
||||
instead of all of them.
|
||||
Test only one byte, block, or truncation size every @var{n} bytes,
|
||||
instead of all of them. If the @code{--block} option is given, @var{n}
|
||||
defaults to the block size. Else @var{n} defaults to 1. Values of
|
||||
@var{n} smaller than the block size will result in overlappinng blocks.
|
||||
(Which is convenient for testing because there are usually too few
|
||||
non-overlappinng blocks in a file).
|
||||
|
||||
@item -e @var{position},@var{value}
|
||||
@itemx --set-byte=@var{position},@var{value}
|
||||
Set byte at @var{position} to @var{value} in the internal buffer after
|
||||
reading and testing @var{filename}.lz but before the first test call to
|
||||
the decompressor. If @var{value} is preceded by @samp{+}, it is added to
|
||||
the original value of the byte at @var{position}. If @var{value} is
|
||||
preceded by @samp{f} (flip), it is XORed with the original value of the
|
||||
byte at @var{position}. This option can be used to run tests with a
|
||||
changed dictionary size, for example.
|
||||
reading and testing @var{file}.lz but before the first test call to the
|
||||
decompressor. If @var{value} is preceded by @samp{+}, it is added to the
|
||||
original value of the byte at @var{position}. If @var{value} is preceded
|
||||
by @samp{f} (flip), it is XORed with the original value of the byte at
|
||||
@var{position}. This option can be used to run tests with a changed
|
||||
dictionary size, for example.
|
||||
|
||||
@item -n
|
||||
@itemx --no-verify
|
||||
Skip initial verification of @var{file}.lz and @samp{zcmp}. May speed up
|
||||
things a lot when testing many (or large) known good files.
|
||||
|
||||
@item -p @var{bytes}
|
||||
@itemx --position=@var{bytes}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue