2176 lines
89 KiB
Text
2176 lines
89 KiB
Text
\input texinfo @c -*-texinfo-*-
|
|
@c %**start of header
|
|
@setfilename lziprecover.info
|
|
@documentencoding ISO-8859-15
|
|
@settitle Lziprecover Manual
|
|
@finalout
|
|
@c %**end of header
|
|
|
|
@set UPDATED 8 January 2025
|
|
@set VERSION 1.25
|
|
|
|
@dircategory Compression
|
|
@direntry
|
|
* Lziprecover: (lziprecover). Data recovery tool for the lzip format
|
|
@end direntry
|
|
|
|
|
|
@ifnothtml
|
|
@titlepage
|
|
@title Lziprecover
|
|
@subtitle Data recovery tool for the lzip format
|
|
@subtitle for Lziprecover version @value{VERSION}, @value{UPDATED}
|
|
@author by Antonio Diaz Diaz
|
|
|
|
@page
|
|
@vskip 0pt plus 1filll
|
|
@end titlepage
|
|
|
|
@contents
|
|
@end ifnothtml
|
|
|
|
@ifnottex
|
|
@node Top
|
|
@top
|
|
|
|
This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}).
|
|
|
|
@menu
|
|
* Introduction:: Purpose and features of lziprecover
|
|
* Invoking lziprecover:: Command-line interface
|
|
* Argument syntax:: By convention, options start with a hyphen
|
|
* File format:: Detailed format of the compressed file
|
|
* Data safety:: Protecting data from accidental loss
|
|
* Fec files:: Forward Error Correction
|
|
* Repairing one byte:: Fixing bit flips and similar errors
|
|
* Merging files:: Fixing several damaged copies
|
|
* Reproducing one sector:: Fixing a missing (zeroed) sector
|
|
* Tarlz:: Options supporting the tar.lz format
|
|
* File names:: Names of the files produced by lziprecover
|
|
* Trailing data:: Extra data appended to the file
|
|
* Examples:: A small tutorial with examples
|
|
* Unzcrash:: Testing the robustness of decompressors
|
|
* Problems:: Reporting bugs
|
|
* Concept index:: Index of concepts
|
|
@end menu
|
|
|
|
@sp 1
|
|
Copyright @copyright{} 2009-2025 Antonio Diaz Diaz.
|
|
|
|
This manual is free documentation: you have unlimited permission to copy,
|
|
distribute, and modify it.
|
|
@end ifnottex
|
|
|
|
|
|
@node Introduction
|
|
@chapter Introduction
|
|
@cindex introduction
|
|
|
|
@uref{http://www.nongnu.org/lzip/lziprecover.html,,Lziprecover}
|
|
is a data recovery tool and decompressor for files in the lzip
|
|
compressed data format (.lz). Lziprecover also provides Forward Error
|
|
Correction (FEC) able to repair any kind of file.
|
|
|
|
Lziprecover is able to repair slightly damaged lzip files (up to one
|
|
single-byte error per member), produce a correct file by merging the good
|
|
parts of two or more damaged copies, reproduce a missing (zeroed) sector
|
|
using a reference file, extract data from damaged files, decompress files,
|
|
and test integrity of files.
|
|
|
|
Lziprecover can remove the damaged members from multimember files, for
|
|
example multimember tar.lz archives.
|
|
|
|
Lziprecover provides random access to the data in multimember files; it only
|
|
decompresses the members containing the desired data.
|
|
|
|
Lziprecover is not a replacement for regular backups, but a last line of
|
|
defense for the case where the backups are also damaged.
|
|
|
|
Lziprecover is able to provide unique data recovery capabilities because the
|
|
lzip format is extraordinarily safe. The simple and safe design of the file
|
|
format complements the embedded error detection provided by the LZMA data
|
|
stream. Any distance larger than the dictionary size acts as a forbidden
|
|
symbol, allowing the decompressor to detect the approximate position of
|
|
errors, and leaving little work for the check sequence (CRC and data sizes)
|
|
in the detection of errors. Lzip is usually able to detect all possible bit
|
|
flips in the compressed data without resorting to the check sequence. It
|
|
would be difficult to write an automatic recovery tool like lziprecover for
|
|
the gzip format. And, as far as I know, it has never been written.
|
|
|
|
A nice feature of the lzip format is that a corrupt byte is easier to repair
|
|
the nearer it is from the beginning of the file. Therefore, with the help of
|
|
lziprecover, losing an entire archive just because of a corrupt byte near
|
|
the beginning is a thing of the past.
|
|
|
|
Compression may be good for long-term archiving. For compressible data,
|
|
multiple compressed copies may provide redundancy in a more useful form and
|
|
may have a better chance of surviving intact than one uncompressed copy
|
|
using the same amount of storage space. This is especially true if the
|
|
format provides recovery capabilities like those of lziprecover, which is
|
|
able to find and combine the good parts of several damaged copies.
|
|
|
|
Lziprecover is able to recover or decompress files produced by any of the
|
|
compressors in the lzip family: lzip, plzip, minilzip/lzlib, clzip, and
|
|
pdlzip.
|
|
|
|
GNU ddrescue provides data recovery capabilities which nicely complement
|
|
those of lziprecover. If the cause of file corruption is a damaged medium,
|
|
the combination @w{GNU ddrescue + lziprecover} is the recommended option for
|
|
recovering data from damaged files. @xref{ddrescue-example},
|
|
@ref{ddrescue-example2}, and @ref{ddrescue-example3}, for examples.
|
|
@ifnothtml
|
|
@xref{Top,GNU ddrescue manual,,ddrescue},
|
|
@end ifnothtml
|
|
@ifhtml
|
|
See the
|
|
@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual}
|
|
@end ifhtml
|
|
for details about ddrescue.
|
|
|
|
If a file is too damaged for lziprecover to repair it, all the recoverable
|
|
data in all members of the file can be extracted with the following command
|
|
(the resulting file may contain errors and some garbage data may be produced
|
|
at the end of each damaged member):
|
|
|
|
@example
|
|
lziprecover -cd --ignore-errors file.lz > file
|
|
@end example
|
|
|
|
When recovering data, lziprecover takes as arguments the names of the
|
|
damaged files and writes zero or more recovered files depending on the
|
|
operation selected and whether the recovery succeeded or not. The damaged
|
|
files themselves are kept unchanged.
|
|
|
|
When decompressing or testing file integrity, lziprecover behaves like lzip
|
|
or lunzip.
|
|
|
|
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
|
|
been compressed. Decompressed is used to refer to data which have undergone
|
|
the process of decompression.
|
|
|
|
|
|
@node Invoking lziprecover
|
|
@chapter Invoking lziprecover
|
|
@cindex invoking
|
|
@cindex options
|
|
@cindex usage
|
|
@cindex version
|
|
|
|
The format for running lziprecover is:
|
|
|
|
@example
|
|
lziprecover [@var{options}] [@var{files}]
|
|
@end example
|
|
|
|
@noindent
|
|
When decompressing or testing, a hyphen @samp{-} used as a @var{file}
|
|
argument means standard input. It can be mixed with other @var{files} and is
|
|
read just once, the first time it appears in the command line. If no file
|
|
names are specified, lziprecover decompresses from standard input to
|
|
standard output. Remember to prepend @file{./} to any file name beginning
|
|
with a hyphen, or use @samp{--}.
|
|
|
|
@noindent
|
|
lziprecover supports the following options: @xref{Argument syntax}.
|
|
|
|
@table @code
|
|
@item -h
|
|
@itemx --help
|
|
Print an informative help message describing the options and exit.
|
|
|
|
@item -V
|
|
@itemx --version
|
|
Print the version number of lziprecover on the standard output and exit.
|
|
This version number should be included in all bug reports.
|
|
|
|
@anchor{--trailing-error}
|
|
@item -a
|
|
@itemx --trailing-error
|
|
Exit with error status 2 if any remaining input is detected after
|
|
decompressing the last member. Such remaining input is usually trailing
|
|
garbage that can be safely ignored. @xref{concat-example}.
|
|
|
|
@item -A
|
|
@itemx --alone-to-lz
|
|
Convert lzma-alone files to lzip format without recompressing, just adding a
|
|
lzip header and trailer. The conversion minimizes the dictionary size of the
|
|
resulting file (and therefore the amount of memory required to decompress
|
|
it). Only streamed files with default LZMA properties can be converted;
|
|
non-streamed lzma-alone files lack the 'End Of Stream' marker required in
|
|
lzip files.
|
|
|
|
The name of the converted lzip file is derived from that of the original
|
|
lzma-alone file as follows:
|
|
|
|
@multitable {filename.lzma} {becomes} {anyothername.lz}
|
|
@item filename.lzma @tab becomes @tab filename.lz
|
|
@item filename.tlz @tab becomes @tab filename.tar.lz
|
|
@item anyothername @tab becomes @tab anyothername.lz
|
|
@end multitable
|
|
|
|
@item -b @var{bytes}
|
|
@itemx --block-size=@var{bytes}
|
|
When creating fec files, make the FEC block size a multiple of @var{bytes},
|
|
which must be a multiple of 512 not larger than @w{1 GiB}.
|
|
|
|
@anchor{--byte-repair}
|
|
@item -B
|
|
@itemx --byte-repair
|
|
Try to repair a @var{file} with small errors (up to one single-byte error
|
|
per member). If successful, a repaired copy is written to the file
|
|
@var{file}_fixed.lz. @var{file} is not modified at all. The exit status is 0
|
|
if the file could be repaired, 2 otherwise. @xref{Repairing one byte}, for a
|
|
complete description of the byte-repair mode.
|
|
|
|
@item -c
|
|
@itemx --stdout
|
|
Write decompressed data to standard output; keep input files unchanged. This
|
|
option (or @option{-o}) is needed when reading from a named pipe (fifo) or
|
|
from a device. Use it also to recover as much of the decompressed data as
|
|
possible when decompressing a corrupt file. @option{-c} overrides @option{-o}.
|
|
@option{-c} has no effect when merging, removing members, repairing,
|
|
reproducing, splitting, testing or listing.
|
|
|
|
@item -d
|
|
@itemx --decompress
|
|
Decompress the files specified. The integrity of the files specified is
|
|
checked. If a file does not exist, can't be opened, or the destination file
|
|
already exists and @option{--force} has not been specified, lziprecover
|
|
continues decompressing the rest of the files and exits with error status 1.
|
|
If a file fails to decompress, or is a terminal, lziprecover exits
|
|
immediately with error status 2 without decompressing the rest of the files.
|
|
A terminal is considered an uncompressed file, and therefore invalid. A
|
|
multimember file with one or more empty members is accepted if redirected to
|
|
standard input or if '-i' is given.
|
|
|
|
@item -D @var{range}
|
|
@itemx --range-decompress=@var{range}
|
|
Decompress only a range of bytes starting at decompressed byte position
|
|
@var{begin} and up to byte position @w{@var{end} - 1}. Byte positions start
|
|
at 0. The bytes produced are sent to standard output unless the option
|
|
@option{-o} is used. This option provides random access to the data in
|
|
multimember files; it only decompresses the members containing the desired
|
|
data. In order to guarantee the correctness of the data produced, all
|
|
members containing any part of the desired data are decompressed and their
|
|
integrity is checked.
|
|
|
|
@anchor{range-format}
|
|
Four formats of @var{range} are recognized, @samp{@var{begin}},
|
|
@samp{@var{begin}-@var{end}}, @samp{@var{begin},@var{size}}, and
|
|
@samp{,@var{size}}. If only @var{begin} is specified, @var{end} is taken as
|
|
the end of the file. If only @var{size} is specified, @var{begin} is taken
|
|
as the beginning of the file.
|
|
|
|
@anchor{--reproduce}
|
|
@item -e
|
|
@itemx --reproduce
|
|
Try to recover a missing (zeroed) sector in @var{file} using a reference
|
|
file and the same version of lzip that created @var{file}. If successful, a
|
|
repaired copy is written to the file @var{file}_fixed.lz. @var{file} is not
|
|
modified at all. The exit status is 0 if the member containing the zeroed
|
|
sector could be repaired, 2 otherwise. Note that @var{file}_fixed.lz may
|
|
still contain errors in the members following the one repaired.
|
|
@xref{Reproducing one sector}, for a complete description of the reproduce
|
|
mode.
|
|
|
|
@item --lzip-level=@var{digit}|a|m[@var{length}]
|
|
Try only the given compression level or match length limit when reproducing
|
|
a zeroed sector. @option{--lzip-level=a} tries all the compression levels
|
|
@w{(0 to 9)}, while @option{--lzip-level=m} tries all the match length limits
|
|
@w{(5 to 273)}.
|
|
|
|
@item --lzip-name=@var{name}
|
|
Set the name of the lzip executable used by @option{--reproduce}. If
|
|
@option{--lzip-name} is not specified, @samp{lzip} is used.
|
|
|
|
@item --reference-file=@var{file}
|
|
Set the reference file used by @option{--reproduce}. It must contain the
|
|
uncompressed data corresponding to the missing compressed data of the zeroed
|
|
sector, plus some context data before and after them.
|
|
|
|
@item -f
|
|
@itemx --force
|
|
Force overwrite of output files.
|
|
|
|
@item -F create[@var{n}]|repair|test|list
|
|
@itemx --fec=create[@var{n}]|repair|test|list
|
|
Create fec files, or repair or test files using previously created fec
|
|
files, or list the contents of fec files. The argument (create, repair,
|
|
test, or list) can be abbreviated even to a single letter. Option
|
|
@option{-i} is required to repair or test a file using a corrupt fec file,
|
|
or to list a corrupt fec file. @xref{Fec files}.
|
|
|
|
@var{n} is the number of FEC blocks to be created. The amount of FEC data to
|
|
be created may also be specified as a percentage from 0.003% to 100%, or as
|
|
a number of bytes followed by a @samp{B} (4096B, 16KiB, etc). If @var{n} is
|
|
not specified, it defaults to @samp{8} (8 FEC blocks). (Because, when was
|
|
the last time you saw more than 8 bad sectors affecting the same file?)
|
|
|
|
@option{--fec=create} writes the FEC data created to @var{file}.fec unless
|
|
option @option{-c} or @option{-o} is specified. If a fec file can't be
|
|
created, lziprecover exits immediately with error status 1 without trying to
|
|
create the rest of the files.
|
|
|
|
@option{--fec=repair} and @option{--fec=test} read the FEC data from
|
|
@var{file}.fec unless @option{--fec-file} is specified. @option{--fec=repair}
|
|
writes the repaired file to @var{file}_fixed unless option @option{-c} or
|
|
@option{-o} is specified. @xref{File names}. If a file fails to repair,
|
|
lziprecover exits immediately with error status 2 without repairing the rest
|
|
of the files.
|
|
|
|
@item -0 .. -9
|
|
FEC fragmentation level. Defaults to @option{-9}. Level @option{-0} is the
|
|
fastest; it creates FEC data using GF(2^8), maybe with large blocks. Levels
|
|
@option{-1} to @option{-9} use GF(2^8) or GF(2^16) as required, with
|
|
increasing amounts of smaller blocks.
|
|
|
|
@item --fec-file=@var{file}[/]
|
|
When repairing or testing, read FEC data from @var{file}. If @var{file} ends
|
|
with a slash, it is interpreted as the name of a directory containing the
|
|
fec file(s).
|
|
|
|
@item -i
|
|
@itemx --ignore-errors
|
|
Ignore non-fatal errors.@*
|
|
Make @option{--decompress}, @option{--test}, and @option{--range-decompress}
|
|
ignore format and data errors and continue decompressing the remaining
|
|
members in the file; keep input files unchanged. For example, the commands
|
|
@w{@samp{lziprecover -cd -i file.lz > file}} or
|
|
@w{@samp{lziprecover -D0 -i file.lz > file}} decompress all the recoverable
|
|
data in all members of @file{file.lz} without having to split it first. The
|
|
@w{@samp{-cd -i}} method resyncs to the next member header after each error,
|
|
and is immune to some format errors that make @w{@samp{-D0 -i}} fail. The
|
|
range decompressed may be smaller than the range requested, because of the
|
|
errors. The exit status is set to 0 unless other errors are found (I/O
|
|
errors, for example).
|
|
|
|
Make @option{--fec=repair} and @option{--fec=test} ignore errors in the fec
|
|
file and return with exit status 0 if the repaired/protected file passes the
|
|
test, even if corrupt packets or trailing garbage are found in the fec file.
|
|
Make @option{--fec=list} ignore errors in the fec files.
|
|
|
|
Make @option{--list}, @option{--dump}, @option{--remove}, and @option{--strip}
|
|
ignore format errors. The sizes of the members with errors (especially the
|
|
last) may be wrong.
|
|
|
|
@item -k
|
|
@itemx --keep
|
|
Keep (don't delete) input files during decompression or conversion from
|
|
lzma-alone.
|
|
|
|
@item -l
|
|
@itemx --list
|
|
Print the uncompressed size, compressed size, and percentage saved of the
|
|
files specified. Trailing data are ignored. The values produced are correct
|
|
even for multimember files. If more than one file is given, a final line
|
|
containing the cumulative sizes is printed. With @option{-v}, the dictionary
|
|
size, the number of members in the file, and the amount of trailing data (if
|
|
any) are also printed. With @option{-vv}, the positions and sizes of each
|
|
member in multimember files are also printed. A multimember file with one or
|
|
more empty members is accepted if redirected to standard input or if '-i' is
|
|
given. With @option{-i}, format errors are ignored, and with @option{-ivv},
|
|
gaps between members are shown. The member numbers start at 1 and coincide
|
|
with the file numbers produced by @option{--split}.
|
|
|
|
If any file is damaged, does not exist, can't be opened, or is not regular,
|
|
the final exit status is @w{> 0}. @option{-lq} can be used to check quickly
|
|
(without decompressing) the structural integrity of the files specified.
|
|
(Use @option{--test} to check the data integrity). @option{-alq}
|
|
additionally checks that none of the files specified contain trailing data.
|
|
|
|
@item -m
|
|
@itemx --merge
|
|
Try to produce a correct file by merging the good parts of two or more
|
|
damaged copies. If successful, a repaired copy is written to the file
|
|
@var{file}_fixed.lz. The exit status is 0 if a correct file could be
|
|
produced, 2 otherwise. @xref{Merging files}, for a complete description of
|
|
the merge mode.
|
|
|
|
@item -n @var{n}
|
|
@itemx --threads=@var{n}
|
|
Set the maximum number of worker threads for @option{--fec=create},
|
|
overriding the system's default. Valid values range from 1 to as many as
|
|
your system can support. If this option is not used, lziprecover tries to
|
|
detect the number of processors in the system and use it as default value.
|
|
@w{@samp{lziprecover --help}} shows the system's default value.
|
|
|
|
@item -o @var{file}[/]
|
|
@itemx --output=@var{file}[/]
|
|
If repairing, place the repaired output into @var{file} instead of into
|
|
@var{file}_fixed.lz. If splitting, the names of the files produced are in
|
|
the form @file{rec1@var{file}}, @file{rec2@var{file}}, etc.
|
|
|
|
If creating FEC data and @option{-c} has not been also specified, write the
|
|
FEC data to @var{file}. If @var{file} ends with a slash, it is interpreted
|
|
as the name of a directory where the fec file(s) will be written to. In this
|
|
case, the fec file names are composed by replacing the prefix preceding the
|
|
last slash of each file name specified in the command line with @var{file}
|
|
(or prepending @var{file} if the file name does not contain a slash), and
|
|
appending the extension @file{.fec}.
|
|
|
|
Else, if @option{-c} has not been also specified, write the (de)compressed
|
|
output to @var{file}, automatically creating any missing parent directories;
|
|
keep input files unchanged. This option (or @option{-c}) is needed when
|
|
reading from a named pipe (fifo) or from a device. @w{@option{-o -}} is
|
|
equivalent to @option{-c}. @option{-o} has no effect when testing or listing.
|
|
|
|
@item -q
|
|
@itemx --quiet
|
|
Quiet operation. Suppress all messages.
|
|
|
|
@item -r
|
|
@itemx --recursive
|
|
When creating or reading fec files (but not when listing), for each directory
|
|
operand, read and process all files in that directory, recursively. Follow
|
|
symbolic links given in the command line, but skip symbolic links that are
|
|
encountered recursively. Ignore files and directories named @file{fec} or
|
|
@file{*[-._]fec}.
|
|
|
|
@item -R
|
|
@itemx --dereference-recursive
|
|
When creating or reading fec files (but not when listing), for each directory
|
|
operand, read and process all files in that directory, recursively,
|
|
following all symbolic links. Ignore files and directories named @file{fec}
|
|
or @file{*[-._]fec}.
|
|
|
|
@item -s
|
|
@itemx --split
|
|
Search for members in @var{file} and write each member in its own file. Gaps
|
|
between members are detected and each gap is saved in its own file. Trailing
|
|
data (if any) are saved alone in the last file. You can then use
|
|
@w{@samp{lziprecover -t}} to test the integrity of the resulting files,
|
|
decompress those which are undamaged, and try to repair or partially
|
|
decompress those which are damaged. Gaps may contain garbage or may be
|
|
members with corrupt headers or trailers. If other lziprecover functions
|
|
fail to work on a multimember @var{file} because of damage in headers or
|
|
trailers, try to split @var{file} and then work on each member individually.
|
|
|
|
The names of the files produced are in the form @file{rec1@var{file}},
|
|
@file{rec2@var{file}}, etc, and are designed so that the use of wildcards
|
|
in subsequent processing, for example,
|
|
@w{@samp{lziprecover -cd rec*@var{file} > recovered_data}}, processes the
|
|
files in the correct order. The number of digits used in the names varies
|
|
depending on the number of members in @var{file}.
|
|
|
|
@item -t
|
|
@itemx --test
|
|
Check integrity of the files specified, but don't decompress them. This
|
|
really performs a trial decompression and throws away the result. Use it
|
|
together with @option{-v} to see information about the files. If a file
|
|
fails the test, does not exist, can't be opened, or is a terminal, lziprecover
|
|
continues testing the rest of the files. A final diagnostic is shown at
|
|
verbosity level 1 or higher if any file fails the test when testing multiple
|
|
files. A multimember file with one or more empty members is accepted if
|
|
redirected to standard input or if '-i' is given.
|
|
|
|
@item -v
|
|
@itemx --verbose
|
|
Verbose mode.@*
|
|
When decompressing or testing, further -v's (up to 4) increase the verbosity
|
|
level, showing status, compression ratio, dictionary size, trailer contents
|
|
(CRC, data size, member size), and up to 6 bytes of trailing data (if any)
|
|
both in hexadecimal and as a string of printable ASCII characters.@*
|
|
Two or more @option{-v} options show the progress of decompression.@*
|
|
In other modes, increasing verbosity levels show final status, progress of
|
|
operations, and extra information (for example, the failed areas).
|
|
|
|
@item --dump=[@var{member_list}][:damaged][:empty][:tdata]
|
|
Dump the members listed, the damaged members (if any), the empty members (if
|
|
any), or the trailing data (if any) of one or more regular multimember files
|
|
to standard output, or to a file if the option @option{-o} is used. If more
|
|
than one file is given, the elements dumped from all the files are
|
|
concatenated. If a file does not exist, can't be opened, or is not regular,
|
|
lziprecover continues processing the rest of the files. If the dump fails in
|
|
one file, lziprecover exits immediately without processing the rest of the
|
|
files. Only @option{--dump=tdata} can write to a terminal.
|
|
@option{--dump=damaged} implies @option{--ignore-errors}.
|
|
|
|
The argument to @option{--dump} is a colon-separated list of the following
|
|
element specifiers; a member list (1,3-6), a reverse member list (r1,3-6),
|
|
and the strings "damaged", "empty", and "tdata" (which may be shortened to
|
|
'd', 'e', and 't' respectively). A member list selects the members (or gaps)
|
|
listed, whose numbers coincide with those shown by @option{--list}. A reverse
|
|
member list selects the members listed counting from the last member in the
|
|
file (r1). Negated versions of both kinds of lists exist (^1,3-6:r^1,3-6)
|
|
which select all the members except those in the list. The strings
|
|
"damaged", "empty", and "tdata" select the damaged members, the empty
|
|
members (those with a data size = 0), and the trailing data respectively. If
|
|
the same member is selected more than once, for example by @samp{1:r1} in a
|
|
single-member file, it is dumped just once. See the following examples:
|
|
|
|
@multitable {@code{3,12:damaged:tdata}} {members 3, 12, damaged members, trailing data}
|
|
@headitem @code{--dump} argument @tab Elements dumped
|
|
@item @code{1,3-6} @tab members 1, 3, 4, 5, 6
|
|
@item @code{r1-3} @tab last 3 members in file
|
|
@item @code{^13,15} @tab all but 13th and 15th members in file
|
|
@item @code{r^1} @tab all but last member in file
|
|
@item @code{damaged} @tab all damaged members in file
|
|
@item @code{empty} @tab all empty members in file
|
|
@item @code{tdata} @tab trailing data
|
|
@item @code{1-5:r1:tdata} @tab members 1 to 5, last member, trailing data
|
|
@item @code{damaged:tdata} @tab damaged members, trailing data
|
|
@item @code{3,12:damaged:tdata} @tab members 3, 12, damaged members, trailing data
|
|
@end multitable
|
|
|
|
@item --remove=[@var{member_list}][:damaged][:empty][:tdata]
|
|
Remove the members listed, the damaged members (if any), the empty members
|
|
(if any), or the trailing data (if any) from regular multimember files in
|
|
place. The date of each file modified is preserved if possible. If all
|
|
members in a file are selected to be removed, the file is left unchanged and
|
|
the exit status is set to 2. If a file does not exist, can't be opened, is
|
|
not regular, or is left unchanged, lziprecover continues processing the rest
|
|
of the files. In case of I/O error, lziprecover exits immediately without
|
|
processing the rest of the files. See @option{--dump} above for a description
|
|
of the argument.
|
|
|
|
This option may be dangerous even if only the trailing data are being
|
|
removed because the file may be corrupt or the trailing data may contain a
|
|
forbidden combination of characters. @xref{Trailing data}. It is safer to
|
|
send the output of @option{--strip} to a temporary file, check it, and then
|
|
copy it over the original file. But if you prefer @option{--remove} because of
|
|
its more efficient in-place removal, it is advisable to make a backup before
|
|
attempting the removal. At least check that @w{@samp{lzip -cd file.lz | wc -c}}
|
|
and the uncompressed size shown by @w{@samp{lzip -l file.lz}} match before
|
|
attempting the removal of trailing data.
|
|
|
|
@item --strip=[@var{member_list}][:damaged][:empty][:tdata]
|
|
Copy one or more regular multimember files to standard output (or to a file
|
|
if the option @option{-o} is used), stripping the members listed, the
|
|
damaged members (if any), the empty members (if any), or the trailing data
|
|
(if any) from each file. If all members in a file are selected to be
|
|
stripped, the trailing data (if any) are also stripped even if @samp{tdata}
|
|
is not specified. If more than one file is given, the files are
|
|
concatenated. In this case the trailing data are also stripped from all but
|
|
the last file even if @samp{tdata} is not specified. If a file does not
|
|
exist, can't be opened, or is not regular, lziprecover continues processing
|
|
the rest of the files. If a file fails to copy, lziprecover exits
|
|
immediately without processing the rest of the files. See @option{--dump}
|
|
above for a description of the argument.
|
|
|
|
@item --loose-trailing
|
|
When decompressing, testing, or listing, allow trailing data whose first
|
|
bytes are so similar to the magic bytes of a lzip header that they can
|
|
be confused with a corrupt header. Use this option if a file triggers a
|
|
'corrupt header' error and the cause is not indeed a corrupt header.
|
|
|
|
@item --nonzero-repair
|
|
Repair in place a nonzero first LZMA byte in the files specified. With
|
|
@option{-v}, print the number of members repaired. The date of each file
|
|
modified is preserved if possible.
|
|
|
|
@end table
|
|
|
|
@noindent
|
|
lziprecover also supports the following debug options (for experts):
|
|
|
|
@table @code
|
|
@item -E @var{range}[,@var{sector_size}]
|
|
@itemx --debug-reproduce=@var{range}[,@var{sector_size}]
|
|
Load the compressed @var{file} into memory, set all bytes in the positions
|
|
specified by @var{range} to 0, and try to reproduce a correct compressed
|
|
file. @xref{--reproduce}. @xref{range-format}, for a description of
|
|
@var{range}. If a @var{sector_size} is specified, set each sector to 0 in
|
|
sequence and try to reproduce the file, printing to standard output final
|
|
statistics of the number of sectors reproduced successfully. Exit with
|
|
nonzero status only in case of fatal error.
|
|
|
|
@item -F dc@var{n}
|
|
@itemx --fec=dc@var{n}
|
|
Simulate FEC repair of all combinations of @var{n} zeroed block errors
|
|
spread along the whole input file.
|
|
|
|
@item -F dz@var{range}[:@var{range}]...
|
|
@itemx --fec=dz@var{range}[:@var{range}]...
|
|
Simulate FEC repair of one or more zeroed block(s) in the input file at the
|
|
@var{range}s given. The @var{range}s may be unordered and overlapping.
|
|
Lziprecover sorts and joins them as needed. @xref{range-format}, for a
|
|
description of @var{range}.
|
|
|
|
@item -F dZ@var{size}[,@var{delta}]
|
|
@itemx --fec=dZ@var{size}[,@var{delta}]
|
|
Simulate FEC repair of all possible zeroed blocks of size @var{size} in the
|
|
input file. @var{delta} defaults to @var{size}. Values of @var{delta}
|
|
smaller than @var{size} result in overlapping blocks.
|
|
|
|
@item -M
|
|
@itemx --md5sum
|
|
Print to standard output the MD5 digests of the input @var{files} one per
|
|
line in the same format produced by the @command{md5sum} tool. Lziprecover
|
|
uses MD5 digests to check the result of some operations. This option can be
|
|
used to test the correctness of lziprecover's implementation of the MD5
|
|
algorithm.
|
|
|
|
@item -S[@var{value}]
|
|
@itemx --nrep-stats[=@var{value}]
|
|
Compare the frequency of sequences of N repeated bytes of a given
|
|
@var{value} in the compressed LZMA streams of the input @var{files} with the
|
|
frequency expected for random data (1 / 2^(8N)). If @var{value} is not
|
|
specified, print the frequency of repeated sequences of all possible byte
|
|
values. Print cumulative data for all the files, followed by the name of the
|
|
first file with the longest sequence.
|
|
|
|
@anchor{--unzcrash}
|
|
@item -U 1|B@var{size}
|
|
@itemx --unzcrash=1|B@var{size}
|
|
With argument @samp{1}, test 1-bit errors in the LZMA stream of the
|
|
compressed input @var{file} like the command
|
|
@w{@samp{unzcrash -b1 -p7 -s-20 'lzip -t' @var{file}}} but in memory, and
|
|
therefore much faster (30 to 50 times faster). @xref{Unzcrash}. This option
|
|
tests all the members independently in a multimember file, skipping headers
|
|
and trailers. If a decompression succeeds, the decompressed output is
|
|
compared with the decompressed output of the original @var{file} using MD5
|
|
digests. @var{file} must not contain errors and must decompress correctly
|
|
for the comparisons to work.
|
|
|
|
With argument @samp{B}, test zeroed sectors (blocks of bytes) in the LZMA
|
|
stream of the compressed input @var{file} like the command
|
|
@w{@samp{unzcrash --block=@var{size} -d1 -p7 -s-(@var{size}+20) 'lzip -t' @var{file}}}
|
|
but in memory, and therefore much faster. Testing and comparisons work just
|
|
like with the argument @samp{1} explained above.
|
|
|
|
By default @option{--unzcrash} only prints the interesting cases; CRC
|
|
mismatches, size mismatches, unsupported marker codes, unexpected EOFs,
|
|
apparently successful decompressions, and decoder errors detected 50_000 or
|
|
more bytes beyond the byte (or the start of the block) being tested. At
|
|
verbosity level 1 (-v) it also prints decoder errors detected 10_000 or more
|
|
bytes beyond the byte being tested. At verbosity level 2 (-vv) it prints all
|
|
cases for 1-bit errors or the decoder errors detected beyond the end of the
|
|
block for zeroed blocks.
|
|
|
|
@item -W @var{position},@var{value}
|
|
@itemx --debug-decompress=@var{position},@var{value}
|
|
Load the compressed @var{file} into memory, set the byte at @var{position}
|
|
to @var{value}, and decompress the modified compressed data to standard
|
|
output. If the damaged member can be decompressed to the end (just fails
|
|
with a CRC mismatch), the members following it are also decompressed.
|
|
@xref{--set-byte}, for a description of @var{value}.
|
|
|
|
@item -X[@var{position},@var{value}]
|
|
@itemx --show-packets[=@var{position},@var{value}]
|
|
Load the compressed @var{file} into memory, optionally set the byte at
|
|
@var{position} to @var{value}, decompress the modified compressed data
|
|
(discarding the output), and print to standard output descriptions of the
|
|
LZMA packets being decoded. @xref{--set-byte}, for a description of @var{value}.
|
|
|
|
@item -Y @var{range}
|
|
@itemx --debug-delay=@var{range}
|
|
Load the compressed @var{file} into memory and then repeatedly decompress
|
|
it, increasing 256 times each byte of the subset of the compressed data
|
|
positions specified by @var{range}, so as to test all possible one-byte
|
|
errors. For each decompression error find the error detection delay and
|
|
print to standard output the maximum delay. The error detection delay is the
|
|
difference between the position of the error and the position where the
|
|
decoder realized that the data contains an error. @xref{range-format}, for a
|
|
description of @var{range}.
|
|
|
|
@item -Z @var{position},@var{value}
|
|
@itemx --debug-byte-repair=@var{position},@var{value}
|
|
Load the compressed @var{file} into memory, set the byte at @var{position}
|
|
to @var{value}, and then try to repair the byte error. @xref{--byte-repair}.
|
|
@xref{--set-byte}, for a description of @var{value}.
|
|
|
|
@item --gf16
|
|
Forces the use of GF(2^16) when creating FEC blocks even if the number of
|
|
blocks fits in GF(2^8).
|
|
|
|
@end table
|
|
|
|
Numbers given as arguments to options may be expressed in decimal,
|
|
hexadecimal, or octal (using the same syntax as integer constants in C++),
|
|
and may be followed by a multiplier and an optional @samp{B} for "byte".
|
|
|
|
Table of SI and binary prefixes (unit multipliers):
|
|
|
|
@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)}
|
|
@headitem Prefix @tab Value @tab | @tab Prefix @tab Value
|
|
@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024)
|
|
@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20)
|
|
@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30)
|
|
@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40)
|
|
@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50)
|
|
@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60)
|
|
@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70)
|
|
@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80)
|
|
@item R @tab ronnabyte (10^27) @tab | @tab Ri @tab robibyte (2^90)
|
|
@item Q @tab quettabyte (10^30) @tab | @tab Qi @tab quebibyte (2^100)
|
|
@end multitable
|
|
|
|
@sp 1
|
|
Exit status: 0 for a normal exit, 1 for environmental problems
|
|
(file not found, invalid command-line options, I/O errors, etc), 2 to
|
|
indicate a corrupt or invalid input file, 3 for an internal consistency
|
|
error (e.g., bug) which caused lziprecover to panic.
|
|
|
|
|
|
@node Argument syntax
|
|
@chapter Syntax of command-line arguments
|
|
@cindex argument syntax
|
|
|
|
POSIX recommends these conventions for command-line arguments.
|
|
|
|
@itemize @bullet
|
|
@item A command-line argument is an option if it begins with a hyphen
|
|
(@samp{-}).
|
|
|
|
@item Option names are single alphanumeric characters.
|
|
|
|
@item Certain options require an argument.
|
|
|
|
@item An option and its argument may or may not appear as separate tokens.
|
|
(In other words, the whitespace separating them is optional).
|
|
Thus, @w{@option{-o foo}} and @option{-ofoo} are equivalent.
|
|
|
|
@item One or more options without arguments, followed by at most one option
|
|
that takes an argument, may follow a hyphen in a single token.
|
|
Thus, @option{-abc} is equivalent to @w{@option{-a -b -c}}.
|
|
|
|
@item Options typically precede other non-option arguments.
|
|
|
|
@item The argument @samp{--} terminates all options; any following arguments
|
|
are treated as non-option arguments, even if they begin with a hyphen.
|
|
|
|
@item A token consisting of a single hyphen character is interpreted as an
|
|
ordinary non-option argument. By convention, it is used to specify standard
|
|
input, standard output, or a file named @samp{-}.
|
|
@end itemize
|
|
|
|
@noindent
|
|
GNU adds @dfn{long options} to these conventions:
|
|
|
|
@itemize @bullet
|
|
@item A long option consists of two hyphens (@samp{--}) followed by a name
|
|
made of alphanumeric characters and hyphens. Option names are typically one
|
|
to three words long, with hyphens to separate words. Abbreviations can be
|
|
used for the long option names as long as the abbreviations are unique.
|
|
|
|
@item A long option and its argument may or may not appear as separate
|
|
tokens. In the latter case they must be separated by an equal sign @samp{=}.
|
|
Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent.
|
|
@end itemize
|
|
|
|
@noindent
|
|
The syntax of options with an optional argument is
|
|
@option{-<short_option><argument>} (without whitespace), or
|
|
@option{--<long_option>=<argument>}.
|
|
|
|
|
|
@node File format
|
|
@chapter File format
|
|
@cindex file format
|
|
|
|
Perfection is reached, not when there is no longer anything to add, but
|
|
when there is no longer anything to take away.@*
|
|
--- Antoine de Saint-Exupery
|
|
|
|
In the diagram below, a box like this:
|
|
|
|
@verbatim
|
|
+---+
|
|
| | <-- the vertical bars might be missing
|
|
+---+
|
|
@end verbatim
|
|
|
|
represents one byte; a box like this:
|
|
|
|
@verbatim
|
|
+==============+
|
|
| |
|
|
+==============+
|
|
@end verbatim
|
|
|
|
represents a variable number of bytes.
|
|
|
|
@noindent
|
|
A lzip file consists of one or more independent "members" (compressed data
|
|
sets). The members simply appear one after another in the file, with no
|
|
additional information before, between, or after them. Each member can
|
|
encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data.
|
|
The size of a multimember file is unlimited. Empty members (data size = 0)
|
|
are not allowed in multimember files.
|
|
|
|
Each member has the following structure:
|
|
|
|
@verbatim
|
|
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
|
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
@end verbatim
|
|
|
|
All multibyte values are stored in little endian order.
|
|
|
|
@table @samp
|
|
@item ID string (the "magic" bytes)
|
|
A four byte string, identifying the lzip format, with the value "LZIP"
|
|
(0x4C, 0x5A, 0x49, 0x50).
|
|
|
|
@item VN (version number, 1 byte)
|
|
Just in case something needs to be modified in the future. 1 for now.
|
|
|
|
@item DS (coded dictionary size, 1 byte)
|
|
The dictionary size is calculated by taking a power of 2 (the base size)
|
|
and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
|
|
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
|
|
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
|
from the base size to obtain the dictionary size.@*
|
|
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
|
|
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
|
|
|
@item LZMA stream
|
|
The LZMA stream, terminated by an 'End Of Stream' marker. Uses default values
|
|
for encoder properties.
|
|
@ifnothtml
|
|
@xref{Stream format,,,lzip},
|
|
@end ifnothtml
|
|
@ifhtml
|
|
See
|
|
@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format}
|
|
@end ifhtml
|
|
for a complete description.
|
|
|
|
@item CRC32 (4 bytes)
|
|
Cyclic Redundancy Check (CRC) of the original uncompressed data.
|
|
|
|
@item Data size (8 bytes)
|
|
Size of the original uncompressed data.
|
|
|
|
@item Member size (8 bytes)
|
|
Total size of the member, including header and trailer. This field acts
|
|
as a distributed index, improves the checking of stream integrity, and
|
|
facilitates the safe recovery of undamaged members from multimember files.
|
|
Lzip limits the member size to @w{2 PiB} to prevent the data size field from
|
|
overflowing.
|
|
@end table
|
|
|
|
|
|
@node Data safety
|
|
@chapter Protecting data from accidental loss
|
|
@cindex data safety
|
|
|
|
It is a fact of life that sometimes data becomes corrupt. Software has
|
|
errors. Hardware may misbehave or fail. RAM may be struck by a cosmic ray.
|
|
This is why a safe enough integrity checking is needed in compressed
|
|
formats, and the reason why a data recovery tool is sometimes needed.
|
|
|
|
There are 3 main types of data corruption that may cause data loss:
|
|
single-byte errors, multibyte errors (generally affecting a whole sector
|
|
in a block device), and total device failure.
|
|
|
|
The two methods most effective to protect data from accidental loss are
|
|
backup copies and Forward Error Correction (FEC). Both methods can be used
|
|
simultaneously, and both are supported by lziprecover.
|
|
|
|
Lziprecover protects natively against single-byte errors as long as file
|
|
integrity is checked frequently enough that a second single-byte error does
|
|
not develop in the same member before the first one is repaired.
|
|
@xref{Repairing one byte}.
|
|
|
|
Lziprecover protects against multibyte errors in 3 cases: if a fec file is
|
|
available (@pxref{Fec files}), if at least one backup copy of the file is
|
|
available (@pxref{Merging files}), or if the error is a zeroed sector and
|
|
the uncompressed data corresponding to the zeroed sector are available
|
|
(@pxref{Reproducing one sector}). FEC is best. Else, if you can choose
|
|
between merging and reproducing, try merging first because it is usually
|
|
faster, easier to use, and has a high probability of success.
|
|
|
|
Lziprecover can't help in case of device failure. The only remedy for total
|
|
device failure is storing backup copies in separate media.
|
|
|
|
The extraordinary safety of the lzip format allows lziprecover to use the
|
|
redundance that occurs naturally when making compressed backups. Lziprecover
|
|
can recover data that would not be recoverable from files compressed in
|
|
other formats. See these two examples of the data recovery capabilities
|
|
offered by lziprecover:
|
|
|
|
@menu
|
|
* Merging with a backup:: Recovering a file using a damaged backup
|
|
* Reproducing a mailbox:: Recovering new messages using an old backup
|
|
@end menu
|
|
|
|
|
|
@node Merging with a backup
|
|
@section Recovering a file using a damaged backup
|
|
@cindex merging with a backup
|
|
|
|
Let's suppose that you made a compressed backup of your valuable scientific
|
|
data and stored two copies on separate media. Years later you notice that
|
|
both copies are corrupt.
|
|
|
|
If you compressed the data with gzip and both copies suffer any damage in
|
|
the data stream, even if it is just one altered bit, the original data can
|
|
only be recovered by an expert, if at all.
|
|
|
|
If you used bzip2, and if the file is large enough to contain more than one
|
|
compressed data block (usually larger than @w{900 kB} uncompressed), and if
|
|
no block is damaged in both files, then the data can be manually recovered
|
|
by splitting the files with bzip2recover, checking every block, and then
|
|
copying the right blocks in the right order into another file.
|
|
|
|
But if you used lzip, the data can be automatically recovered with
|
|
@w{@samp{lziprecover --merge}} as long as the damaged areas don't overlap.
|
|
|
|
Note that each error in a bzip2 file makes a whole block unusable, but each
|
|
error in a lzip file only affects the damaged bytes, making it possible to
|
|
recover a file with thousands of errors.
|
|
|
|
|
|
@node Reproducing a mailbox
|
|
@section Recovering new messages using an old backup
|
|
@cindex reproducing a mailbox
|
|
|
|
Let's suppose that you make periodic backups of your email messages stored
|
|
in one or more mailboxes. (A mailbox is a file containing a possibly large
|
|
number of email messages). New messages are appended to the end of each
|
|
mailbox, therefore the initial part of two consecutive backups is identical
|
|
unless some messages have been changed or deleted in the meantime. The new
|
|
messages added to each backup are usually a small part of the whole mailbox.
|
|
|
|
@verbatim
|
|
+============================================+
|
|
| Older backup containing some messages |
|
|
+============================================+
|
|
+============================================+========================+
|
|
| Newer backup containing the messages above | plus some new messages |
|
|
+============================================+========================+
|
|
@end verbatim
|
|
|
|
One day you discover that your mailbox has disappeared because you deleted
|
|
it inadvertently or because of a bug in your email reader. Not only that.
|
|
You need to recover a recent message, but the last backup you made of the
|
|
mailbox (the newer backup above) has lost the data corresponding to a whole
|
|
sector because of an I/O error in the part containing the old messages.
|
|
|
|
If you compressed the mailbox with gzip, usually none of the new messages
|
|
can be recovered even if they are intact because all the data beyond the
|
|
missing sector can't be decoded.
|
|
|
|
If you used bzip2, and if the newer backup is large enough that the new
|
|
messages are in a different compressed data block than the one damaged
|
|
(usually larger than @w{900 kB} uncompressed), then you can recover the new
|
|
messages manually with bzip2recover. If the backups are identical except for
|
|
the new messages appended, you may even recover the whole newer backup by
|
|
combining the good blocks from both backups.
|
|
|
|
But if you used lzip, the whole newer backup can be automatically recovered
|
|
with @w{@samp{lziprecover --reproduce}} as long as the missing bytes can be
|
|
recovered from the older backup, even if other messages in the common part
|
|
have been changed or deleted. Mailboxes seem to be especially easy to
|
|
reproduce. The probability of reproducing a mailbox
|
|
(@pxref{performance-of-reproduce}) is almost as high as that of merging two
|
|
identical backups (@pxref{performance-of-merge}).
|
|
|
|
|
|
@node Fec files
|
|
@chapter Forward Error Correction
|
|
@cindex forward error correction
|
|
|
|
Forward Error Correction (FEC) is any way of protecting data from corruption
|
|
by creating redundant data that can be used later to repair errors in the
|
|
protected data. Lziprecover uses a Hilbert-based Reed-Solomon code to create
|
|
one fec file (with extension @file{.fec}) for each file that needs to be
|
|
protected. The fec files created by lziprecover are reproducible.
|
|
|
|
Reed-Solomon is the most space-efficient Error Correcting Code (ECC) for
|
|
data stored in block devices. It creates redundant FEC blocks in such a way
|
|
that X FEC blocks allow the recuperation of any combination of up to X lost
|
|
data blocks. All the blocks (data and FEC) are of the same size, which in
|
|
fec files must be a multiple of 512 bytes. Reed-Solomon is not optimum for
|
|
corruption affecting random single bits in a file because each corrupt bit
|
|
invalidates the whole block containing it.
|
|
|
|
Usually, a corrupt file does not provide an indication of where the
|
|
corruption is located. Therefore, each fec file stores one or two arrays of
|
|
CRCs to detect the corrupt blocks in the protected file and mark them as
|
|
erasures (missing data blocks). Thus, a fec file creates its own Binary
|
|
Erasure Channel (BEC) for the protected file.
|
|
|
|
Lziprecover's FEC algorithm can repair any kind of file, but its ability to
|
|
repair lzip files is greater than for other kinds of files. Lziprecover can
|
|
use the statistical properties of lzip data to repair a lzip file rescued
|
|
with ddrescue, even if the fec file is so damaged that it has lost both CRC
|
|
arrays. Lzip data helps to locate the corrupt parts of the file even without
|
|
a BEC. For this to work, at least one chksum packet header must be intact to
|
|
provide @samp{prodata_size}, @samp{prodata_md5}, and @samp{gf16}.
|
|
|
|
@menu
|
|
* How Reed-Solomon works:: It is basically an equation system
|
|
* Implementation details:: How lziprecover implements Reed-Solomon
|
|
* Creating fec files:: How to create fec files
|
|
* Testing with fec files:: How to test files using fec files
|
|
* Repairing with fec files:: How to repair files using fec files
|
|
* Fec file format:: Detailed format of the redundant FEC data
|
|
@end menu
|
|
|
|
|
|
@node How Reed-Solomon works
|
|
@section How Reed-Solomon works
|
|
@cindex Reed-Solomon tutorial
|
|
|
|
To illustrate how Reed-Solomon works on the BEC, we will use an example with
|
|
standard arithmetic on integers. Note that in lziprecover's FEC each
|
|
variable is a (potentialy large) block of data, not a single value.
|
|
|
|
Given variables x, y, and z (the protected data) whose values are known, an
|
|
equation system can be created where the values of three FEC variables p, q,
|
|
and r can be computed from the values of x, y, and z:
|
|
|
|
@example
|
|
x + y + z = p (1)
|
|
x + 2y + 3z = q (2)
|
|
x + 3y + 2z = r (3)
|
|
@end example
|
|
|
|
If we have that x = 1, y = 2, and z = 3, then p = 6, q = 14, and r = 13:
|
|
|
|
@example
|
|
1 + 2 + 3 = 6 (1a)
|
|
1 + 4 + 9 = 14 (2a)
|
|
1 + 6 + 6 = 13 (3a)
|
|
@end example
|
|
|
|
Now, if the values of x and y are lost because of data corruption, they can
|
|
be recomputed by using any two of the three equations above. For example, if
|
|
we replace the known values of z, p, and q in equations (1) and (2) we get:
|
|
|
|
@example
|
|
x + y + 3 = 6 (1b)
|
|
x + 2y + 9 = 14 (2b)
|
|
@end example
|
|
|
|
In order to solve the two equations above, we first reduce them by
|
|
subtracting the values of the known data variables from the values of the
|
|
FEC variables:
|
|
|
|
@example
|
|
x + y = 6 - 3 (1c)
|
|
x + 2y = 14 - 9 (2c)
|
|
@end example
|
|
|
|
which gives the reduced FEC values P = 3 and Q = 5.
|
|
|
|
Then we create a square matrix @samp{A} with the coefficients of x and y in
|
|
the equations above, and invert it. @samp{A} must be invertible and must not
|
|
have any zero element. We also create the column vector D with the missing
|
|
data variables x and y, and the column vector F with the reduced FEC values
|
|
P and Q:
|
|
|
|
@example
|
|
D = x A = 1 1 A^-1 = 2 -1 F = P
|
|
y 1 2 -1 1 Q
|
|
@end example
|
|
|
|
Then we multiply the inverse matrix @samp{A^-1} by the column vector F to
|
|
obtain the values of x and y (D = A^-1 * F):
|
|
|
|
@example
|
|
x = 2P - Q (1d)
|
|
y = -P + Q (2d)
|
|
@end example
|
|
|
|
which finally gives us the lost values x = 1 and y = 2:
|
|
|
|
@example
|
|
x = 2 * 3 - 5 (1e)
|
|
y = -3 + 5 (2e)
|
|
@end example
|
|
|
|
|
|
@node Implementation details
|
|
@section How lziprecover implements Reed-Solomon
|
|
@cindex Reed-Solomon details
|
|
|
|
Lziprecover's implementation of Reed-Solomon can manage up to 128 data
|
|
blocks + 128 FEC blocks when using a Galois Field of size 256 (GF(2^8)), or
|
|
up to 32768 data blocks + 32768 FEC blocks when using a Galois Field of size
|
|
65536 (GF(2^16)). GF(2^8) is included because it is faster for files up to
|
|
about @w{1 MB}. The number of FEC blocks is currently limited to 2048
|
|
because of memory and time limits. Inverting a matrix for 32768 FEC blocks
|
|
would take a week and require @w{2 GiB} of RAM.
|
|
|
|
The file is repaired in memory. Therefore, enough virtual memory
|
|
@w{(RAM + swap)} to contain the protected file and the FEC data is required.
|
|
The file size is limited to less than @w{2 GiB} on 32-bit systems. The
|
|
repaired file is checked with a MD5 digest.
|
|
|
|
Lziprecover divides the input file in 1 to 32768 data blocks of the same
|
|
size, which ranges from 512 bytes to @w{128 TiB}, for a total protected file
|
|
size of up to @w{4 EiB}. It then uses a Hilbert matrix @samp{A} to create up
|
|
to 2048 FEC blocks of the same size as the data blocks. Lziprecover corrects
|
|
errors in the data blocks by first reducing the equation system to M
|
|
equations with M unknowns each, where M is the number of missing data
|
|
blocks. Then it multiplies the inverse of the relevant submatrix of @samp{A}
|
|
by the vector of results of the M equations to recompute the values of the
|
|
missing data blocks.
|
|
|
|
Lziprecover implements GF(2^8) with polynomial 0x11D and GF(2^16) with
|
|
polynomial 0x1100B.
|
|
|
|
A Hilbert matrix is defined as @w{A[i][j] = 1 / (i + j + 1)} for
|
|
@w{i,j >= 0}. But, as in a Galois Field the addition is the exclusive or
|
|
operation, applying the Hilbert definition produces a singular (non
|
|
invertible) matrix. To avoid this problem, lziprecover uses a Hilbert matrix
|
|
starting at row @w{r0 = gf_size / 2}. I.e., @w{A[i][j] = 1 / (i + j + r0)}
|
|
for @w{0 <= i,j < r0}. (@samp{gf_size} is the size of the Galois Field).
|
|
|
|
|
|
@node Creating fec files
|
|
@section How to create fec files
|
|
@cindex fec create
|
|
|
|
@noindent
|
|
Example 1: Create the fec file @file{archive.tar.lz.fec} and store it in the
|
|
same directory where @file{archive.tar.lz} is.
|
|
|
|
@example
|
|
lziprecover -v -Fc archive.tar.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 2: Create the fec file @file{archive.tar.lz.fec} and store it in the
|
|
directory @file{fec}.
|
|
|
|
@example
|
|
lziprecover -v -Fc -o fec/ archive.tar.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 3: Create recursively one fec file for each file in the directory
|
|
@file{datadir} and store them in the tree under the directory @file{fec}.
|
|
|
|
@example
|
|
lziprecover -v -r -Fc -o fec/ datadir
|
|
@end example
|
|
|
|
@noindent
|
|
Example 4: Create fec files for a collection of photos stored in directory
|
|
@file{photos} and store them in the directory @file{photos-fec}.
|
|
|
|
@example
|
|
lziprecover -v -Fc -o photos-fec/ photos/*
|
|
@end example
|
|
|
|
|
|
@node Testing with fec files
|
|
@section How to test files using fec files
|
|
@cindex fec test
|
|
|
|
@noindent
|
|
Example 1: Test the integrity of @file{archive.tar.lz} using the fec file
|
|
@file{archive.tar.lz.fec} from the same directory.
|
|
|
|
@example
|
|
lziprecover -v -Ft archive.tar.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 2: Test the integrity of the files @file{foo.lz} and @file{bar.lz}
|
|
using the corresponding fec files stored in the directory @file{fec}.
|
|
|
|
@example
|
|
lziprecover -v -Ft --fec-file=fec/ foo.lz bar.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 3: Test recursively the integrity of all the files in the directory
|
|
@file{datadir} using the fec files stored in the directory tree under the
|
|
directory @file{fec}.
|
|
|
|
@example
|
|
lziprecover -v -r -Ft --fec-file=fec/ datadir
|
|
@end example
|
|
|
|
@noindent
|
|
Example 4: Test the integrity of a collection of photos stored in directory
|
|
@file{photos} using fec files from directory @file{photos-fec}.
|
|
|
|
@example
|
|
lziprecover -v -Ft --fec-file=photos-fec/ photos/*
|
|
@end example
|
|
|
|
|
|
@node Repairing with fec files
|
|
@section How to repair files using fec files
|
|
@cindex fec repair
|
|
|
|
@noindent
|
|
Example 1: Repair the file @file{archive.tar.lz} using the fec file
|
|
@file{archive.tar.lz.fec} from the same directory. The repaired file is
|
|
written to @file{archive_fixed.tar.lz} in the same directory.
|
|
|
|
@example
|
|
lziprecover -v -Fr archive.tar.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 2: Repair the files @file{foo.lz} and @file{bar.lz} using the
|
|
corresponding fec files stored in the directory @file{fec}.
|
|
|
|
@example
|
|
lziprecover -v -Fr --fec-file=fec/ foo.lz bar.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 3: Repair recursively all the damaged files in the directory
|
|
@file{datadir} using the fec files stored in the directory tree under the
|
|
directory @file{fec}.
|
|
|
|
@example
|
|
lziprecover -v -r -Fr --fec-file=fec/ datadir
|
|
@end example
|
|
|
|
@anchor{ddrescue-example}
|
|
@noindent
|
|
Example 4: Recover a collection of photos from a damaged external drive
|
|
(@file{/dev/sdc1}). The photos are in directory @file{photos}, and the fec
|
|
files are in directory @file{photos-fec}.
|
|
|
|
@example
|
|
ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile
|
|
mount -o loop,ro hdimage /mnt/hdimage
|
|
cp -a /mnt/hdimage/photos photos
|
|
cp -a /mnt/hdimage/photos-fec photos-fec
|
|
umount /mnt/hdimage
|
|
lziprecover -v -Fr --fec-file=photos-fec/ photos/*
|
|
(Check and rename repaired files. They are named @file{photos/*_fixed})
|
|
@end example
|
|
|
|
|
|
@node Fec file format
|
|
@section Fec file format
|
|
@cindex fec file format
|
|
|
|
A fec file consists of one chksum packet, one or more fec packets, and one
|
|
optional second chksum packet. The first chksum packet must be the first
|
|
packet in the file, but the second chksum packet does not need to be the
|
|
last packet in the file. The essential information is stored in the chksum
|
|
packet(s), while the potentially numerous fec packets are kept as simple as
|
|
possible:
|
|
|
|
@verbatim
|
|
+=================+===============+=================+
|
|
| Chksum packet | Fec packets | Chksum packet |
|
|
+=================+===============+=================+
|
|
@end verbatim
|
|
|
|
All multibyte values are stored in little endian order except
|
|
@samp{prodata_md5}.
|
|
|
|
@anchor{fbs}
|
|
The @samp{fbs} (fec_block_size) field is coded as a little endian 16-bit
|
|
floating point unsigned integer with an 11-bit mantissa at bits 0-10 and a
|
|
5-bit exponent at bits 11-15. The mantissa is an integer between 0 and 2047.
|
|
The exponent is an integer between 9 and 40, stored with a bias of -9; the
|
|
exponent 9 is stored as 0, and 40 is stored as 31. Values are stored with
|
|
the largest mantissa and smallest exponent; 4096 is stored as m=8, e=0. This
|
|
encoding can store values from 0 bytes to @w{2047 TiB} @w{(2^51 - 2^40 bytes)}
|
|
with a maximum resolution of 512 bytes, but 0 and the values beyond
|
|
@w{128 TiB} are not used:
|
|
|
|
@verbatim
|
|
5 11
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| exp | mantissa | The 'fbs' (fec_block_size) field
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
15 11 10 0
|
|
@end verbatim
|
|
|
|
The fec file format is 4-byte aligned for speed because FEC data are created
|
|
and decoded 4 bytes at a time. The 4-byte alignment has been achieved by a
|
|
careful design, without adding any padding bytes.
|
|
|
|
The fec file format has an overhead of 8 bytes per protected data block,
|
|
plus 16 bytes per FEC block, plus 80 bytes.
|
|
|
|
@subsection Chksum packet
|
|
@cindex chksum packet
|
|
|
|
A chksum packet contains one CRC for each of the N data blocks in the
|
|
protected file, and is structured as shown in the following table. All
|
|
lengths and offsets are in decimal:
|
|
|
|
@multitable {prodata_size} {36 + 4N} {Length (in bytes)}
|
|
@headitem Field Name @tab Offset @tab Length (in bytes)
|
|
@item magic @tab 0 @tab 4
|
|
@item version @tab 4 @tab 1
|
|
@item flags @tab 5 @tab 1
|
|
@item fbs @tab 6 @tab 2
|
|
@item prodata_size @tab 8 @tab 8
|
|
@item prodata_md5 @tab 16 @tab 16
|
|
@item header_crc @tab 32 @tab 4
|
|
@item crc_array @tab 36 @tab 4N
|
|
@item payload_crc @tab 36 + 4N @tab 4
|
|
@end multitable
|
|
|
|
@table @samp
|
|
@item magic
|
|
A four byte string identifying the chksum packet (and therefore the fec
|
|
file), with the value 0xB3, 0xA5, 0xB6, 0xAF. (The complement of "LZIP").
|
|
|
|
@item version
|
|
Just in case something needs to be modified in the future. 0 for now.
|
|
|
|
@item flags
|
|
Bit 0 (is_crc_c): crc_array contains CRC32 (0) or CRC32-C (1).@*
|
|
Bit 1 (gf16): Galois field is GF(2^8) (0) or GF(2^16) (1).@*
|
|
Bits 2-7: zero.
|
|
|
|
@item fbs (coded fec_block_size)
|
|
Number of FEC bytes per block. It is a multiple of 512 bytes between 512
|
|
bytes and @w{128 TiB}. @xref{fbs}.
|
|
|
|
@item prodata_size
|
|
Size of the protected file. 1 byte to @w{4 EiB}.
|
|
|
|
@item prodata_md5
|
|
Md5sum of the protected file. Stored in big endian order.
|
|
|
|
@item header_crc
|
|
CRC32 of the previous fields, including magic.
|
|
|
|
@item crc_array
|
|
Array of @var{n} CRCs corresponding to the @var{n} blocks in which the
|
|
protected file is divided. @var{n} is @w{@samp{ceil( prodata_size / fbs )}}.
|
|
The first chksum packet contains an array of CRC32s, while the second chksum
|
|
packet (if present) contains an array of CRC32-Cs.
|
|
|
|
For the expected thousands of bit flips caused by a zeroed sector, a
|
|
symmetric CRC like CRC32 is probably better than CRC32-C, which detects all
|
|
the errors with an odd number of bit flips at the expense of a larger number
|
|
of undetected errors with an even number of bit flips.
|
|
|
|
@item payload_crc
|
|
CRC32 of the crc_array.
|
|
@end table
|
|
|
|
@subsection Fec packet
|
|
@cindex fec packet
|
|
|
|
A fec packet contains one FEC block and is structured as shown in the
|
|
following table. All lengths and offsets are in decimal:
|
|
|
|
@multitable {payload_crc} {12 + fbs} {Length (in bytes)}
|
|
@headitem Field Name @tab Offset @tab Length (in bytes)
|
|
@item magic @tab 0 @tab 4
|
|
@item fbn @tab 4 @tab 2
|
|
@item fbs @tab 6 @tab 2
|
|
@item header_crc @tab 8 @tab 4
|
|
@item fec_block @tab 12 @tab fbs
|
|
@item payload_crc @tab 12 + fbs @tab 4
|
|
@end multitable
|
|
|
|
@table @samp
|
|
@item magic
|
|
A four byte string identifying the fec packet, with the value "\xB3FEC"
|
|
(0xB3, 0x46, 0x45, 0x43).
|
|
|
|
@item fbn (fec_block_number)
|
|
Number of this FEC block (0 to 32767). Required to compute the decode matrix.
|
|
|
|
@item fbs (coded fec_block_size)
|
|
Number of FEC bytes per block. It is a multiple of 512 bytes between 512
|
|
bytes and @w{128 TiB}. @xref{fbs}.
|
|
|
|
@item header_crc
|
|
CRC32 of the previous fields, including magic.
|
|
|
|
@item fec_block
|
|
The FEC block.
|
|
|
|
@item payload_crc
|
|
CRC32 of the fec_block.
|
|
@end table
|
|
|
|
|
|
@node Repairing one byte
|
|
@chapter Repairing one byte
|
|
@cindex repairing one byte
|
|
|
|
Lziprecover can repair perfectly most files with small errors (up to one
|
|
single-byte error per member), without the need of any extra redundance at
|
|
all. If the reparation is successful, the repaired file is identical bit for
|
|
bit to the original. This makes lzip files resistant to bit flip, one of the
|
|
most common forms of data corruption.
|
|
|
|
The file is repaired in memory. Therefore, enough virtual memory
|
|
@w{(RAM + swap)} to contain the largest damaged member is required. Member
|
|
size is limited to @w{2 GiB} on 32-bit systems.
|
|
|
|
The error may be located anywhere in the file except in the first 5 bytes of
|
|
each member header (magic and version) or in the @samp{Member size} field of
|
|
the trailer (last 8 bytes of each member). If the error is in the header it
|
|
can be easily repaired with a text editor like GNU Moe (@pxref{File
|
|
format}). If the error is in the member size, it is enough to ignore the
|
|
message about @samp{bad member size} when decompressing.
|
|
|
|
Bit flip happens when one bit in the file is changed from 0 to 1 or vice
|
|
versa. It may be caused by bad RAM or even by natural radiation. I have
|
|
seen a case of bit flip in a file stored on an USB flash drive.
|
|
|
|
One byte may seem small, but most file corruptions not produced by
|
|
transmission errors or I/O errors just affect one byte, or even one bit,
|
|
of the file. Also, unlike magnetic media, where errors usually affect a
|
|
whole sector, solid-state storage devices tend to produce single-byte
|
|
errors, which lziprecover can repair.
|
|
|
|
Repairing a file can take some time. Small files or files with the error
|
|
located near the beginning can be repaired in a few seconds. But
|
|
repairing a large file compressed with a large dictionary size and with
|
|
the error located far from the beginning, may take hours.
|
|
|
|
On the other hand, errors located near the beginning of the file cause
|
|
much more loss of data than errors located near the end. So lziprecover
|
|
repairs more efficiently the worst errors.
|
|
|
|
|
|
@node Merging files
|
|
@chapter Merging files
|
|
@cindex merging files
|
|
|
|
If you have several copies of a file but all of them are too damaged to
|
|
repair them individually (@pxref{Repairing one byte}), lziprecover can try
|
|
to produce a correct file by merging the good parts of the damaged copies.
|
|
|
|
The merge may succeed even if some copies of the file have all the headers
|
|
and trailers damaged, as long as there is at least one copy of every header
|
|
and trailer intact, even if they are in different copies of the file.
|
|
|
|
The merge fails if the damaged areas overlap (at least one byte is damaged
|
|
in all copies), or are adjacent and the boundary can't be determined, or if
|
|
the copies have too many damaged areas.
|
|
|
|
All the copies to be merged must have the same size. If any of them is
|
|
larger or smaller than it should, either because it has been truncated or
|
|
because it got some garbage data appended at the end, it can be brought to
|
|
the correct size with the following command before merging it with the other
|
|
copies:
|
|
|
|
@example
|
|
ddrescue -s<correct_size> -x<correct_size> file.lz correct_size_file.lz
|
|
@end example
|
|
|
|
@anchor{performance-of-merge}
|
|
To give you an idea of its possibilities, when merging two copies, each of
|
|
them with one damaged area affecting 1 percent of the copy, the probability
|
|
of obtaining a correct file is about 98 percent. With three such copies the
|
|
probability rises to 99.97 percent. For large files (a few MB) with small
|
|
errors (one sector damaged per copy), the probability approaches 100 percent
|
|
even with only two copies. (Supposing that the errors are randomly located
|
|
inside each copy).
|
|
|
|
Some types of solid-state device (NAND flash, for example) can produce
|
|
bursts of scattered single-bit errors. Lziprecover is able to merge
|
|
files with thousands of such scattered errors by grouping the errors
|
|
into clusters and then merging the files as if each cluster were a
|
|
single error.
|
|
|
|
Here is a real case of successful merging. Two copies of the file
|
|
@file{icecat-3.5.3-x86.tar.lz} (compressed size @w{9 MB}) became corrupt
|
|
while stored on the same NAND flash device. One of the copies had 76
|
|
single-bit errors scattered in an area of 1020 bytes, and the other had
|
|
3028 such errors in an area of 31729 bytes. Lziprecover produced a
|
|
correct file, identical to the original, in just 5 seconds:
|
|
|
|
@example
|
|
lziprecover -vvm a/icecat-3.5.3-x86.tar.lz b/icecat-3.5.3-x86.tar.lz
|
|
Merging member 1 of 1 (2552 errors)
|
|
2552 errors have been grouped in 16 clusters.
|
|
Trying variation 2 of 2, block 2
|
|
Input files merged successfully.
|
|
@end example
|
|
|
|
Note that the number of errors reported by lziprecover (2552) is lower
|
|
than the number of corrupt bytes (3104) because contiguous corrupt bytes
|
|
are counted as a single multibyte error.
|
|
|
|
@anchor{ddrescue-example2}
|
|
@noindent
|
|
Example 1: Recover a compressed backup from two copies on CD-ROM with
|
|
error-checked merging of copies.
|
|
|
|
@example
|
|
ddrescue -d -r1 -b2048 /dev/cdrom cdimage1 mapfile1
|
|
mount -t iso9660 -o loop,ro cdimage1 /mnt/cdimage
|
|
cp /mnt/cdimage/backup.tar.lz rescued1.tar.lz
|
|
umount /mnt/cdimage
|
|
(insert second copy in the CD drive)
|
|
ddrescue -d -r1 -b2048 /dev/cdrom cdimage2 mapfile2
|
|
mount -t iso9660 -o loop,ro cdimage2 /mnt/cdimage
|
|
cp /mnt/cdimage/backup.tar.lz rescued2.tar.lz
|
|
umount /mnt/cdimage
|
|
lziprecover -m -v -o backup.tar.lz rescued1.tar.lz rescued2.tar.lz
|
|
Input files merged successfully.
|
|
lziprecover -tv backup.tar.lz
|
|
backup.tar.lz: ok
|
|
@end example
|
|
|
|
@noindent
|
|
Example 2: Recover the first volume of those created with the command
|
|
@w{@samp{lzip -b 32MiB -S 650MB big_db}} from two copies,
|
|
@file{big_db1_00001.lz} and @file{big_db2_00001.lz}, with member 07
|
|
damaged in the first copy, member 18 damaged in the second copy, and
|
|
member 12 damaged in both copies. The correct file produced is saved in
|
|
@file{big_db_00001.lz}.
|
|
|
|
@example
|
|
lziprecover -m -v -o big_db_00001.lz big_db1_00001.lz big_db2_00001.lz
|
|
Input files merged successfully.
|
|
lziprecover -tv big_db_00001.lz
|
|
big_db_00001.lz: ok
|
|
@end example
|
|
|
|
|
|
@node Reproducing one sector
|
|
@chapter Reproducing one sector
|
|
@cindex reproducing one sector
|
|
|
|
Lziprecover can recover a zeroed sector in a lzip file by concatenating the
|
|
decompressed contents of the file up to the beginning of the zeroed sector
|
|
and the uncompressed data corresponding to the zeroed sector, and then
|
|
feeding the concatenated data to the same version of lzip that created the
|
|
file. For this to work, a reference file is required containing the
|
|
uncompressed data corresponding to the missing compressed data of the zeroed
|
|
sector, plus some context data before and after them. It is possible to
|
|
recover a large file using just a few kB of reference data.
|
|
|
|
The difficult part is finding a suitable reference file. It must contain the
|
|
exact data required (possibly mixed with other data). Containing similar
|
|
data is not enough.
|
|
|
|
A zeroed sector may be caused by the incomplete recovery of a damaged
|
|
storage device (with I/O errors) using, for example, ddrescue. The
|
|
reproduction can't be done if the zeroed sector overlaps with the first 15
|
|
bytes of a member, or if the zeroed sector is smaller than 8 bytes.
|
|
|
|
The file is reproduced in memory. Therefore, enough virtual memory
|
|
@w{(RAM + swap)} to contain the damaged member is required. Member size is
|
|
limited to @w{2 GiB} on 32-bit systems.
|
|
|
|
To understand how it works, take any lzipped file, say @file{foo.lz},
|
|
decompress it (keeping the original), and try to reproduce an artificially
|
|
zeroed sector in it by running the following commands:
|
|
|
|
@example
|
|
lzip -kd foo.lz
|
|
lziprecover -vv --debug-reproduce=65536,512 --reference-file=foo foo.lz
|
|
@end example
|
|
|
|
@noindent
|
|
which should produce an output like the following:
|
|
|
|
@example
|
|
Reproducing: foo.lz
|
|
Reference file: foo
|
|
Testing sectors of size 512 at file positions 65536 to 66047
|
|
(master mpos = 65536, dpos = 296892)
|
|
foo: Match found at offset 296892
|
|
Reproduction succeeded at pos 65536
|
|
|
|
1 sectors tested
|
|
1 reproductions returned with zero status
|
|
all comparisons passed
|
|
@end example
|
|
|
|
Using @file{foo} as reference file guarantees that any zeroed sector in
|
|
@file{foo.lz} can be reproduced because both files contain the same data. In
|
|
real use, the reference file needs to contain the data corresponding to the
|
|
zeroed sector, but the rest of the data (if any) may differ between both
|
|
files. The reference data may be obtained from the partial decompression of
|
|
the damaged file itself if it contains repeated data. For example if the
|
|
damaged file is a compressed tarball containing several partially modified
|
|
versions of the same file.
|
|
|
|
The offset reported by lziprecover is the position in the reference file of
|
|
the first byte that could not be decompressed. This is the first byte that
|
|
will be compressed to reproduce the zeroed sector.
|
|
|
|
The reproduce mode tries to reproduce the missing compressed data originally
|
|
present in the zeroed sector. It is based on the perfect reproducibility of
|
|
lzip files (lzip produces identical compressed output from identical input).
|
|
Therefore, the same version of lzip that created the file to be reproduced
|
|
should be used to reproduce the zeroed sector. Near versions may also work
|
|
because the output of lzip changes infrequently. If reproducing a tar.lz
|
|
archive created with tarlz, the version of lzip, clzip, or minilzip
|
|
corresponding to the version of the lzlib library used by tarlz to create
|
|
the archive should be used.
|
|
|
|
When recovering a tar.lz archive and using as reference a file from the
|
|
filesystem, if the zeroed sector encodes (part of) a tar header, the archive
|
|
can't be reproduced. Therefore, the less overhead (smaller headers) a tar
|
|
archive has, the more probable is that the zeroed sector does not include a
|
|
header, and that the archive can be reproduced. The tarlz format has minimum
|
|
overhead. It uses basic ustar headers, and only adds extended pax headers
|
|
when they are required.
|
|
|
|
@anchor{performance-of-reproduce}
|
|
@section Performance of @option{--reproduce}
|
|
|
|
Reproduce mode is especially useful when recovering a corrupt backup (or a
|
|
corrupt source tarball) that is part of a series. Usually only a small
|
|
fraction of the data changes from one backup to the next or from one version
|
|
of a source tarball to the next. This makes sometimes possible to reproduce
|
|
a given corrupted version using reference data from a near version. The
|
|
following two tables show the fraction of reproducible sectors (reproducible
|
|
sectors divided by total sectors in archive) for some archives, using sector
|
|
sizes of 512 and 4096 bytes. @file{mailbox-aug.tar.lz} is a backup of some
|
|
of my mailboxes. @file{backup-feb.tar.lz} and @file{backup-apr.tar.lz} are
|
|
real backups of my own working directory:
|
|
|
|
@multitable {Reference file} {gawk-5.0.1.tar.lz} {4369 / 5844 = 74.76%}
|
|
@headitem Reference file @tab File @tab Reproducible (512)
|
|
@item backup-feb.tar @tab backup-apr.tar.lz @tab 3273 / 4342 = 75.38%
|
|
@item backup-apr.tar @tab backup-feb.tar.lz @tab 3259 / 4161 = 78.32%
|
|
@item gawk-5.0.0.tar @tab gawk-5.0.1.tar.lz @tab 4369 / 5844 = 74.76%
|
|
@item gawk-5.0.1.tar @tab gawk-5.0.0.tar.lz @tab 4379 / 5603 = 78.15%
|
|
@item gmp-6.1.1.tar @tab gmp-6.1.2.tar.lz @tab 2454 / 3787 = 64.8%
|
|
@item gmp-6.1.2.tar @tab gmp-6.1.1.tar.lz @tab 2461 / 3782 = 65.07%
|
|
@end multitable
|
|
|
|
@multitable {mailbox-mar.tar} {mailbox-aug.tar.lz} {4036 / 4252 = 94.92%}
|
|
@headitem Reference file @tab File @tab Reproducible (4096)
|
|
@item mailbox-mar.tar @tab mailbox-aug.tar.lz @tab 4036 / 4252 = 94.92%
|
|
@item backup-feb.tar @tab backup-apr.tar.lz @tab 264 / 542 = 48.71%
|
|
@item backup-apr.tar @tab backup-feb.tar.lz @tab 264 / 520 = 50.77%
|
|
@item gawk-5.0.0.tar @tab gawk-5.0.1.tar.lz @tab 327 / 730 = 44.79%
|
|
@item gawk-5.0.1.tar @tab gawk-5.0.0.tar.lz @tab 326 / 700 = 46.57%
|
|
@item gmp-6.1.1.tar @tab gmp-6.1.2.tar.lz @tab 175 / 473 = 37%
|
|
@item gmp-6.1.2.tar @tab gmp-6.1.1.tar.lz @tab 181 / 472 = 38.35%
|
|
@end multitable
|
|
|
|
Note that the "performance of reproduce" is a probability, not a partial
|
|
recovery. The data are either recovered fully (with the probability X shown
|
|
in the last column of the tables above) or not recovered at all (with
|
|
probability @w{1 - X}).
|
|
|
|
@noindent
|
|
Example 1: Recover a damaged source tarball with a zeroed sector of 512
|
|
bytes at file position 1019904, using as reference another source tarball
|
|
for a different version of the software.
|
|
|
|
@example
|
|
lziprecover -vv -e --reference-file=gmp-6.1.1.tar gmp-6.1.2.tar.lz
|
|
Reproducing bad area in member 1 of 1
|
|
(begin = 1019904, size = 512, value = 0x00)
|
|
(master mpos = 1019904, dpos = 6292134)
|
|
warning: gmp-6.1.1.tar: Partial match found at offset 6277798, len 8716.
|
|
Reference data may be mixed with other data.
|
|
Trying level -9
|
|
Reproducing position 1015808
|
|
Member reproduced successfully.
|
|
Copy of input file reproduced successfully.
|
|
@end example
|
|
|
|
@anchor{ddrescue-example3}
|
|
@noindent
|
|
Example 2: Recover a damaged backup with a zeroed sector of 4096 bytes at
|
|
file position 1019904, using as reference a previous backup. The damaged
|
|
backup comes from a damaged partition copied with ddrescue.
|
|
|
|
@example
|
|
ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile
|
|
mount -o loop,ro hdimage /mnt/hdimage
|
|
cp /mnt/hdimage/backup.tar.lz backup.tar.lz
|
|
umount /mnt/hdimage
|
|
lzip -t backup.tar.lz
|
|
backup.tar.lz: Decoder error at pos 1020530
|
|
lziprecover -vv -e --reference-file=old_backup.tar backup.tar.lz
|
|
Reproducing bad area in member 1 of 1
|
|
(begin = 1019904, size = 4096, value = 0x00)
|
|
(master mpos = 1019903, dpos = 5857954)
|
|
warning: old_backup.tar: Partial match found at offset 5743778, len 9546.
|
|
Reference data may be mixed with other data.
|
|
Trying level -9
|
|
Reproducing position 1015808
|
|
Member reproduced successfully.
|
|
Copy of input file reproduced successfully.
|
|
@end example
|
|
|
|
@noindent
|
|
Example 3: Recover a damaged backup with a zeroed sector of 4096 bytes at
|
|
file position 1019904, using as reference a file from the filesystem. (If
|
|
the zeroed sector encodes (part of) a tar header, the tarball can't be
|
|
reproduced).
|
|
|
|
@example
|
|
# List the contents of the backup tarball to locate the damaged member.
|
|
tarlz -n0 -tvf backup.tar.lz
|
|
[...]
|
|
example.txt
|
|
tarlz: Skipping to next header.
|
|
tarlz: backup.tar.lz: Archive ends unexpectedly.
|
|
# Find in the filesystem the last file listed and use it as reference.
|
|
lziprecover -vv -e --reference-file=/somedir/example.txt backup.tar.lz
|
|
Reproducing bad area in member 1 of 1
|
|
(begin = 1019904, size = 4096, value = 0x00)
|
|
(master mpos = 1019903, dpos = 5857954)
|
|
/somedir/example.txt: Match found at offset 9378
|
|
Trying level -9
|
|
Reproducing position 1015808
|
|
Member reproduced successfully.
|
|
Copy of input file reproduced successfully.
|
|
@end example
|
|
|
|
If @file{backup.tar.lz} is a multimember file with more than one member
|
|
damaged and lziprecover shows the message @samp{One member reproduced. Copy
|
|
of input file still contains errors.}, the procedure shown in the example
|
|
above can be repeated until all the members have been reproduced.
|
|
|
|
@samp{tarlz --keep-damaged -n0 -xf backup.tar.lz example.txt} produces a
|
|
partial copy of the reference file @file{example.txt} that may help locate a
|
|
complete copy in the filesystem or in another backup, even if
|
|
@file{example.txt} has been renamed.
|
|
|
|
|
|
@node Tarlz
|
|
@chapter Options supporting the tar.lz format
|
|
@cindex tarlz
|
|
|
|
@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,Tarlz} is a
|
|
massively parallel (multi-threaded) combined implementation of the tar
|
|
archiver and the
|
|
@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html,,lzip} compressor.
|
|
|
|
Tarlz creates tar archives using a simplified and safer variant of the POSIX
|
|
pax format compressed in lzip format, keeping the alignment between tar
|
|
members and lzip members. The resulting multimember tar.lz archive is
|
|
backward compatible with standard tar tools like GNU tar, which treat it
|
|
like any other tar.lz archive.
|
|
@ifnothtml
|
|
@xref{Top,tarlz manual,,tarlz}, and @ref{Top,lzip manual,,lzip}.
|
|
@end ifnothtml
|
|
|
|
Multimember tar.lz archives have some safety advantages over solidly
|
|
compressed tar.lz archives. For example, in case of corruption, tarlz can
|
|
extract all the undamaged members from the tar.lz archive, skipping over the
|
|
damaged members, just like the standard (uncompressed) tar. Keeping the
|
|
alignment between tar members and lzip members minimizes the amount of data
|
|
lost in case of corruption. In this chapter we'll explain the ways in which
|
|
lziprecover can recover and process multimember tar.lz archives.
|
|
|
|
@section Recovering damaged multimember tar.lz archives
|
|
|
|
If you have several copies of the damaged archive, try merging them first
|
|
because merging has a high probability of success. @xref{Merging files}. If
|
|
the command below prints something like
|
|
@w{@samp{Input files merged successfully.}} you are done and
|
|
@file{archive.tar.lz} now contains the recovered archive:
|
|
|
|
@example
|
|
lziprecover -m -v -o archive.tar.lz a/archive.tar.lz b/archive.tar.lz
|
|
@end example
|
|
|
|
If you only have one copy of the damaged archive with a zeroed block of data
|
|
caused by an I/O error, you may try to reproduce the archive.
|
|
@xref{Reproducing one sector}. If the command below prints something like
|
|
@w{@samp{Copy of input file reproduced successfully.}} you are done and
|
|
@file{archive_fixed.tar.lz} now contains the recovered archive:
|
|
|
|
@example
|
|
lziprecover -vv -e --reference-file=old_archive.tar archive.tar.lz
|
|
@end example
|
|
|
|
If you only have one copy of the damaged archive, you may try to repair the
|
|
archive, but this has a lower probability of success. @xref{Repairing one
|
|
byte}. If the command below prints something like
|
|
@w{@samp{Copy of input file repaired successfully.}} you are done and
|
|
@file{archive_fixed.tar.lz} now contains the recovered archive:
|
|
|
|
@example
|
|
lziprecover -v --byte-repair archive.tar.lz
|
|
@end example
|
|
|
|
If all the above fails, and the archive was created with tarlz, you may save
|
|
the damaged members for later and then copy the good members to another
|
|
archive. If the two commands below succeed, @file{bad_members.tar.lz} will
|
|
contain all the damaged members and @file{archive_cleaned.tar.lz} will
|
|
contain a good archive with the damaged members removed:
|
|
|
|
@example
|
|
lziprecover -v --dump=damaged -o bad_members.tar.lz archive.tar.lz
|
|
lziprecover -v --strip=damaged -o archive_cleaned.tar.lz archive.tar.lz
|
|
@end example
|
|
|
|
You can then use @samp{tarlz --keep-damaged} to recover as much data as
|
|
possible from each damaged member in @file{bad_members.tar.lz}:
|
|
|
|
@example
|
|
mkdir tmp
|
|
cd tmp
|
|
tarlz --keep-damaged -xvf ../bad_members.tar.lz
|
|
@end example
|
|
|
|
@section Processing multimember tar.lz archives
|
|
|
|
Lziprecover is able to copy a list of members from a file to another.
|
|
For example the command
|
|
@w{@samp{lziprecover --dump=1-10:r1:tdata archive.tar.lz > subarch.tar.lz}}
|
|
creates a subset archive containing the first ten members, the end-of-file
|
|
blocks, and the trailing data (if any) of @file{archive.tar.lz}. The
|
|
@samp{r1} part selects the last member, which in an appendable tar.lz
|
|
archive contains the end-of-file blocks.
|
|
|
|
|
|
@node File names
|
|
@chapter Names of the files produced by lziprecover
|
|
@cindex file names
|
|
|
|
The name of the fixed file produced by @option{--byte-repair} and
|
|
@option{--merge} is made by appending the string @file{_fixed.lz} to the
|
|
original file name. If the original file name ends with one of the
|
|
extensions @file{.tar.lz}, @file{.lz}, or @file{.tlz}, the string
|
|
@file{_fixed} is inserted before the extension.
|
|
|
|
The name of the fixed file produced by @option{--fec=repair} is made by
|
|
appending the string @file{_fixed} to the original file name. If the
|
|
original file name ends with one of the extensions @file{.tar.lz}, @file{.lz},
|
|
or @file{.tlz}, the string @file{_fixed} is inserted before the extension.
|
|
|
|
|
|
@node Trailing data
|
|
@chapter Extra data appended to the file
|
|
@cindex trailing data
|
|
|
|
Sometimes extra data are found appended to a lzip file after the last
|
|
member. Such trailing data may be:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Padding added to make the file size a multiple of some block size, for
|
|
example when writing to a tape. It is safe to append any amount of
|
|
padding zero bytes to a lzip file.
|
|
|
|
@item
|
|
Useful data added by the user; an 'End Of File' string (to check that the
|
|
file has not been truncated), a cryptographically secure hash, a description
|
|
of file contents, etc. It is safe to append any amount of text to a lzip
|
|
file as long as none of the first four bytes of the text matches the
|
|
corresponding byte in the string "LZIP", and the text does not contain any
|
|
zero bytes (null characters). Nonzero bytes and zero bytes can't be safely
|
|
mixed in trailing data.
|
|
|
|
@item
|
|
Garbage added by some not totally successful copy operation.
|
|
|
|
@item
|
|
Malicious data added to the file in order to make its total size and
|
|
hash value (for a chosen hash) coincide with those of another file.
|
|
|
|
@item
|
|
In rare cases, trailing data could be the corrupt header of another
|
|
member. In multimember or concatenated files the probability of
|
|
corruption happening in the magic bytes is 5 times smaller than the
|
|
probability of getting a false positive caused by the corruption of the
|
|
integrity information itself. Therefore it can be considered to be below
|
|
the noise level. Additionally, the test used by lziprecover to discriminate
|
|
trailing data from a corrupt header has a Hamming distance (HD) of 3,
|
|
and the 3 bit flips must happen in different magic bytes for the test to
|
|
fail. In any case, the option @option{--trailing-error} guarantees that
|
|
any corrupt header is detected.
|
|
@end itemize
|
|
|
|
Trailing data are in no way part of the lzip file format, but tools
|
|
reading lzip files are expected to behave as correctly and usefully as
|
|
possible in the presence of trailing data.
|
|
|
|
Trailing data can be safely ignored in most cases. In some cases, like
|
|
that of user-added data, they are expected to be ignored. In those cases
|
|
where a file containing trailing data must be rejected, the option
|
|
@option{--trailing-error} can be used. @xref{--trailing-error}.
|
|
|
|
Lziprecover facilitates the management of metadata stored as trailing
|
|
data in lzip files. See the following examples:
|
|
|
|
@noindent
|
|
Example 1: Add a comment or description to a compressed file.
|
|
|
|
@example
|
|
# First append the comment as trailing data to a lzip file
|
|
echo 'This file contains this and that' >> file.lz
|
|
# This command prints the comment to standard output
|
|
lziprecover --dump=tdata file.lz
|
|
# This command outputs file.lz without the comment
|
|
lziprecover --strip=tdata file.lz > stripped_file.lz
|
|
# This command removes the comment from file.lz
|
|
lziprecover --remove=tdata file.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 2: Add and check a cryptographically secure hash. (This may be
|
|
convenient, but a separate copy of the hash must be kept in a safe place
|
|
to guarantee that both file and hash have not been maliciously replaced).
|
|
|
|
@example
|
|
sha256sum < file.lz >> file.lz
|
|
lziprecover --strip=tdata file.lz | sha256sum -c \
|
|
<(lziprecover --dump=tdata file.lz)
|
|
@end example
|
|
|
|
|
|
@node Examples
|
|
@chapter A small tutorial with examples
|
|
@cindex examples
|
|
|
|
Example 1: Extract all the files from archive @file{foo.tar.lz}.
|
|
|
|
@example
|
|
tar -xf foo.tar.lz
|
|
or
|
|
lziprecover -cd foo.tar.lz | tar -xf -
|
|
@end example
|
|
|
|
@noindent
|
|
Example 2: Restore a regular file from its compressed version
|
|
@file{file.lz}. If the operation is successful, @file{file.lz} is removed.
|
|
|
|
@example
|
|
lziprecover -d file.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 3: Check the integrity of the compressed file @file{file.lz} and
|
|
show status.
|
|
|
|
@example
|
|
lziprecover -tv file.lz
|
|
@end example
|
|
|
|
@anchor{concat-example}
|
|
@noindent
|
|
Example 4: The right way of concatenating the decompressed output of two or
|
|
more compressed files. @xref{Trailing data}.
|
|
|
|
@example
|
|
Don't do this
|
|
cat file1.lz file2.lz file3.lz | lziprecover -d -
|
|
Do this instead
|
|
lziprecover -cd file1.lz file2.lz file3.lz
|
|
You may also concatenate the compressed files like this
|
|
lziprecover --strip=tdata file1.lz file2.lz file3.lz > file123.lz
|
|
Or keeping the trailing data of the last file like this
|
|
lziprecover --strip=empty file1.lz file2.lz file3.lz > file123.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 5: Decompress @file{file.lz} partially until @w{10 KiB} of
|
|
decompressed data are produced.
|
|
|
|
@example
|
|
lziprecover -D 0,10KiB file.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 6: Decompress @file{file.lz} partially from decompressed byte at
|
|
offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
|
|
|
|
@example
|
|
lziprecover -D 10000-15000 file.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 7: Repair a corrupt byte in the file @file{file.lz}. (Indented lines
|
|
are abridged diagnostic messages from lziprecover).
|
|
|
|
@example
|
|
lziprecover -v --byte-repair file.lz
|
|
Copy of input file repaired successfully.
|
|
lziprecover -tv file_fixed.lz
|
|
file_fixed.lz: ok
|
|
mv file_fixed.lz file.lz
|
|
@end example
|
|
|
|
@noindent
|
|
Example 8: Split the multimember file @file{file.lz} and write each member
|
|
in its own @file{recXXXfile.lz} file. Then use @w{@samp{lziprecover -t}} to
|
|
test the integrity of the resulting files.
|
|
|
|
@example
|
|
lziprecover -s file.lz
|
|
lziprecover -tv rec*file.lz
|
|
@end example
|
|
|
|
|
|
@node Unzcrash
|
|
@chapter Testing the robustness of decompressors
|
|
@cindex unzcrash
|
|
|
|
@xref{--unzcrash}, for a faster way of testing the robustness of lzip.
|
|
|
|
The lziprecover package also includes unzcrash, a program written to test
|
|
robustness to decompression of corrupted data, inspired by unzcrash.c from
|
|
Julian Seward's bzip2. Type @samp{make unzcrash} in the lziprecover source
|
|
directory to build it.
|
|
|
|
By default, unzcrash reads the file specified and then repeatedly
|
|
decompresses it, increasing 256 times each byte of the compressed data, so
|
|
as to test all possible one-byte errors. Note that it may take years or even
|
|
centuries to test all possible one-byte errors in a large file (tens of MB).
|
|
|
|
If the option @option{--block} is given, unzcrash reads the file specified and
|
|
then repeatedly decompresses it, setting all bytes in each successive block
|
|
to the value given, so as to test all possible full sector errors.
|
|
|
|
If the option @option{--truncate} is given, unzcrash reads the file specified
|
|
and then repeatedly decompresses it, truncating the file to increasing
|
|
lengths, so as to test all possible truncation points.
|
|
|
|
None of the three test modes described above should cause any invalid memory
|
|
accesses. If any of them does, please, report it as a bug to the maintainers
|
|
of the decompressor being tested.
|
|
|
|
Unzcrash really executes as a subprocess the shell command specified in the
|
|
first non-option argument, and then writes the file specified in the second
|
|
non-option argument to the standard input of the subprocess, modifying the
|
|
corresponding byte each time. Therefore unzcrash can be used to test any
|
|
decompressor (not only lzip), or even other decoder programs having a
|
|
suitable command-line syntax.
|
|
|
|
If the decompressor returns with zero status, unzcrash compares the output
|
|
of the decompressor for the original and corrupt files. If the outputs
|
|
differ, it means that the decompressor returned a false negative; it failed
|
|
to recognize the corruption and produced garbage output. The only exception
|
|
is when a multimember file is truncated just after the last byte of a
|
|
member, producing a shorter but valid compressed file. Except in this latter
|
|
case, please, report any false negative as a bug.
|
|
|
|
In order to compare the outputs, unzcrash needs a @samp{zcmp} program able
|
|
to understand the format being tested. For example the @samp{zcmp} provided
|
|
by @uref{http://www.nongnu.org/zutils/manual/zutils_manual.html#Zcmp,,zutils}.
|
|
If the @samp{zcmp} program used does not understand the format being tested,
|
|
all the comparisons fail because the compressed files are compared without
|
|
being decompressed first. Use @option{--zcmp=false} to disable comparisons.
|
|
@ifnothtml
|
|
@xref{Zcmp,,,zutils}.
|
|
@end ifnothtml
|
|
|
|
The format for running unzcrash is:
|
|
|
|
@example
|
|
unzcrash [@var{options}] 'lzip -t' @var{file}
|
|
@end example
|
|
|
|
@noindent
|
|
The compressed @var{file} must not contain errors and the decompressor being
|
|
tested must decompress it correctly for the comparisons to work.
|
|
|
|
@noindent
|
|
unzcrash supports the following options:
|
|
|
|
@table @code
|
|
@item -h
|
|
@itemx --help
|
|
Print an informative help message describing the options and exit.
|
|
|
|
@item -V
|
|
@itemx --version
|
|
Print the version number of unzcrash on the standard output and exit.
|
|
This version number should be included in all bug reports.
|
|
|
|
@item -b @var{range}
|
|
@itemx --bits=@var{range}
|
|
Test N-bit errors only, instead of testing all the 255 wrong values for
|
|
each byte. @samp{N-bit error} means any value differing from the
|
|
original value in N bit positions, not a value differing from the
|
|
original value in the bit position N.@*
|
|
The number of N-bit errors per byte (N = 1 to 8) is:
|
|
@w{8 28 56 70 56 28 8 1}
|
|
|
|
@multitable {Examples of @var{range}} {Tests errors of N-bits}
|
|
@headitem Examples of @var{range} @tab Tests errors of N-bits
|
|
@item 1 @tab 1
|
|
@item 1,2,3 @tab 1, 2, 3
|
|
@item 2-4 @tab 2, 3, 4
|
|
@item 1,3-5,8 @tab 1, 3, 4, 5, 8
|
|
@item 1-3,5-8 @tab 1, 2, 3, 5, 6, 7, 8
|
|
@end multitable
|
|
|
|
@item -B[@var{size}][,@var{value}]
|
|
@itemx --block[=@var{size}][,@var{value}]
|
|
Test block errors of given @var{size}, simulating a whole sector I/O error
|
|
by setting all the bytes in the block to @var{value} before attempting
|
|
decompression. @var{size} defaults to 512 bytes. @var{value} defaults to 0.
|
|
By default, only contiguous, non-overlapping blocks are tested, but this may
|
|
be changed with the option @option{--delta}.
|
|
|
|
@item -d @var{n}
|
|
@itemx --delta=@var{n}
|
|
Test one byte, block, or truncation size every @var{n} bytes. If
|
|
@option{--delta} is not specified, unzcrash tests all the bytes,
|
|
non-overlapping blocks, or truncation sizes. Values of @var{n} smaller than
|
|
the block size result in overlapping blocks. (Which is convenient for
|
|
testing because there are usually too few non-overlapping blocks in a file).
|
|
|
|
@anchor{--set-byte}
|
|
@item -e @var{position},@var{value}
|
|
@itemx --set-byte=@var{position},@var{value}
|
|
Set byte at @var{position} to @var{value} in the internal buffer after
|
|
reading and testing @var{file} but before the first test call to the
|
|
decompressor. Byte positions start at 0. If @var{value} is preceded by
|
|
@samp{+}, it is added to the original value of the byte at @var{position}.
|
|
If @var{value} is preceded by @samp{f} (flip), it is XORed with the original
|
|
value of the byte at @var{position}. This option can be used to run tests
|
|
with a changed dictionary size, for example.
|
|
|
|
@item -n
|
|
@itemx --no-check
|
|
Skip initial test of @var{file} and @samp{zcmp}. May speed up things a lot
|
|
when testing many (or large) known good files.
|
|
|
|
@item -p @var{bytes}
|
|
@itemx --position=@var{bytes}
|
|
First byte position to test in the file. Defaults to 0. Negative values
|
|
are relative to the end of the file.
|
|
|
|
@item -q
|
|
@itemx --quiet
|
|
Quiet operation. Suppress all messages.
|
|
|
|
@item -s @var{bytes}
|
|
@itemx --size=@var{bytes}
|
|
Number of byte positions to test. If not specified, the rest of the file
|
|
is tested (from @option{--position} to end of file). Negative values are
|
|
relative to the rest of the file.
|
|
|
|
@item -t
|
|
@itemx --truncate
|
|
Test all possible truncation points in the range specified by
|
|
@option{--position} and @option{--size}.
|
|
|
|
@item -v
|
|
@itemx --verbose
|
|
Verbose mode.
|
|
|
|
@item -z
|
|
@itemx --zcmp=<command>
|
|
Set zcmp command name and options. Defaults to @samp{zcmp}. Use
|
|
@option{--zcmp=false} to disable comparisons. If testing a decompressor
|
|
different from the one used by default by zcmp, it is needed to force
|
|
unzcrash and zcmp to use the same decompressor with a command like
|
|
@w{@samp{unzcrash --zcmp='zcmp --lz=plzip' 'plzip -t' @var{file}}}
|
|
|
|
@end table
|
|
|
|
Exit status: 0 for a normal exit, 1 for environmental problems
|
|
(file not found, invalid command-line options, I/O errors, etc), 2 to
|
|
indicate a corrupt or invalid input file, 3 for an internal consistency
|
|
error (e.g., bug) which caused unzcrash to panic.
|
|
|
|
|
|
@node Problems
|
|
@chapter Reporting bugs
|
|
@cindex bugs
|
|
@cindex getting help
|
|
|
|
There are probably bugs in lziprecover. There are certainly errors and
|
|
omissions in this manual. If you report them, they will get fixed. If
|
|
you don't, no one will ever know about them and they will remain unfixed
|
|
for all eternity, if not longer.
|
|
|
|
If you find a bug in lziprecover, please send electronic mail to
|
|
@email{lzip-bug@@nongnu.org}. Include the version number, which you can
|
|
find by running @w{@samp{lziprecover --version}}.
|
|
|
|
|
|
@node Concept index
|
|
@unnumbered Concept index
|
|
|
|
@printindex cp
|
|
|
|
@bye
|