1
0
Fork 0

Merging upstream version 1.25~rc1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-21 11:33:25 +01:00
parent 1d67e88e3c
commit b8e73cb85f
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
39 changed files with 978 additions and 742 deletions

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 1 October 2024
@set VERSION 1.25-pre1
@set UPDATED 18 November 2024
@set VERSION 1.25-rc1
@dircategory Compression
@direntry
@ -38,6 +38,7 @@ This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}).
@menu
* Introduction:: Purpose and features of lziprecover
* Invoking lziprecover:: Command-line interface
* Argument syntax:: By convention, options start with a hyphen
* File format:: Detailed format of the compressed file
* Data safety:: Protecting data from accidental loss
* Fec files:: Forward Error Correction
@ -139,8 +140,16 @@ pdlzip.
If the cause of file corruption is a damaged medium, the combination
@w{GNU ddrescue + lziprecover} is the recommended option for recovering data
from damaged lzip files. @xref{ddrescue-example}, and
@ref{ddrescue-example2}, for examples.
from damaged files. @xref{ddrescue-example}, @ref{ddrescue-example2}, and
@ref{ddrescue-example3}, for examples.
@ifnothtml
@xref{Top,GNU ddrescue manual,,ddrescue},
@end ifnothtml
@ifhtml
See the
@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual}
@end ifhtml
for details about ddrescue.
If a file is too damaged for lziprecover to repair it, all the recoverable
data in all members of the file can be extracted with the following command
@ -186,11 +195,7 @@ standard output. Remember to prepend @file{./} to any file name beginning
with a hyphen, or use @samp{--}.
@noindent
lziprecover supports the following
@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
@ifnothtml
@xref{Argument syntax,,,arg_parser}.
@end ifnothtml
lziprecover supports the following options: @xref{Argument syntax}.
@table @code
@item -h
@ -211,12 +216,12 @@ garbage that can be safely ignored. @xref{concat-example}.
@item -A
@itemx --alone-to-lz
Convert lzma-alone files to lzip format without recompressing, just
adding a lzip header and trailer. The conversion minimizes the
dictionary size of the resulting file (and therefore the amount of
memory required to decompress it). Only streamed files with default LZMA
properties can be converted; non-streamed lzma-alone files lack the "End
Of Stream" marker required in lzip files.
Convert lzma-alone files to lzip format without recompressing, just adding a
lzip header and trailer. The conversion minimizes the dictionary size of the
resulting file (and therefore the amount of memory required to decompress
it). Only streamed files with default LZMA properties can be converted;
non-streamed lzma-alone files lack the 'End Of Stream' marker required in
lzip files.
The name of the converted lzip file is derived from that of the original
lzma-alone file as follows:
@ -258,24 +263,27 @@ already exists and @option{--force} has not been specified, lziprecover
continues decompressing the rest of the files and exits with error status 1.
If a file fails to decompress, or is a terminal, lziprecover exits
immediately with error status 2 without decompressing the rest of the files.
A terminal is considered an uncompressed file, and therefore invalid.
A terminal is considered an uncompressed file, and therefore invalid. A
multimember file with one or more empty members is accepted if redirected to
standard input or if '-i' is given.
@item -D @var{range}
@itemx --range-decompress=@var{range}
Decompress only a range of bytes starting at decompressed byte position
@var{begin} and up to byte position @w{@var{end} - 1}. Byte positions start
at 0. This option provides random access to the data in multimember files;
it only decompresses the members containing the desired data. In order to
guarantee the correctness of the data produced, all members containing any
part of the desired data are decompressed and their integrity is checked.
at 0. The bytes produced are sent to standard output unless the option
@option{-o} is used. This option provides random access to the data in
multimember files; it only decompresses the members containing the desired
data. In order to guarantee the correctness of the data produced, all
members containing any part of the desired data are decompressed and their
integrity is checked.
@anchor{range-format}
Four formats of @var{range} are recognized, @samp{@var{begin}},
@samp{@var{begin}-@var{end}}, @samp{@var{begin},@var{size}}, and
@samp{,@var{size}}. If only @var{begin} is specified, @var{end} is taken as
the end of the file. If only @var{size} is specified, @var{begin} is taken
as the beginning of the file. The bytes produced are sent to standard output
unless the option @option{--output} is used.
as the beginning of the file.
@anchor{--reproduce}
@item -e
@ -371,7 +379,8 @@ last) may be wrong.
@item -k
@itemx --keep
Keep (don't delete) input files during decompression.
Keep (don't delete) input files during decompression or conversion from
lzma-alone.
@item -l
@itemx --list
@ -381,9 +390,11 @@ even for multimember files. If more than one file is given, a final line
containing the cumulative sizes is printed. With @option{-v}, the dictionary
size, the number of members in the file, and the amount of trailing data (if
any) are also printed. With @option{-vv}, the positions and sizes of each
member in multimember files are also printed. With @option{-i}, format errors
are ignored, and with @option{-ivv}, gaps between members are shown. The
member numbers shown coincide with the file numbers produced by @option{--split}.
member in multimember files are also printed. A multimember file with one or
more empty members is accepted if redirected to standard input or if '-i' is
given. With @option{-i}, format errors are ignored, and with @option{-ivv},
gaps between members are shown. The member numbers start at 1 and coincide
with the file numbers produced by @option{--split}.
If any file is damaged, does not exist, can't be opened, or is not regular,
the final exit status is @w{> 0}. @option{-lq} can be used to check quickly
@ -402,8 +413,8 @@ the merge mode.
@item -n @var{n}
@itemx --threads=@var{n}
Set the maximum number of worker threads for @option{--fec=create},
overriding the system's default. Valid values range from 1 to "as many as
your system can support". If this option is not used, lziprecover tries to
overriding the system's default. Valid values range from 1 to as many as
your system can support. If this option is not used, lziprecover tries to
detect the number of processors in the system and use it as default value.
@w{@samp{lziprecover --help}} shows the system's default value.
@ -411,7 +422,7 @@ detect the number of processors in the system and use it as default value.
@itemx --output=@var{file}[/]
If repairing, place the repaired output into @var{file} instead of into
@var{file}_fixed.lz. If splitting, the names of the files produced are in
the form @file{rec01@var{file}}, @file{rec02@var{file}}, etc.
the form @file{rec1@var{file}}, @file{rec2@var{file}}, etc.
If creating FEC data and @option{-c} has not been also specified, write the
FEC data to @var{file}. If @var{file} ends with a slash, it is interpreted
@ -458,8 +469,8 @@ members with corrupt headers or trailers. If other lziprecover functions
fail to work on a multimember @var{file} because of damage in headers or
trailers, try to split @var{file} and then work on each member individually.
The names of the files produced are in the form @file{rec01@var{file}},
@file{rec02@var{file}}, etc, and are designed so that the use of wildcards
The names of the files produced are in the form @file{rec1@var{file}},
@file{rec2@var{file}}, etc, and are designed so that the use of wildcards
in subsequent processing, for example,
@w{@samp{lziprecover -cd rec*@var{file} > recovered_data}}, processes the
files in the correct order. The number of digits used in the names varies
@ -473,7 +484,8 @@ together with @option{-v} to see information about the files. If a file
fails the test, does not exist, can't be opened, or is a terminal, lziprecover
continues testing the rest of the files. A final diagnostic is shown at
verbosity level 1 or higher if any file fails the test when testing multiple
files.
files. A multimember file with one or more empty members is accepted if
redirected to standard input or if '-i' is given.
@item -v
@itemx --verbose
@ -489,8 +501,8 @@ operations, and extra information (for example, the failed areas).
@item --dump=[@var{member_list}][:damaged][:empty][:tdata]
Dump the members listed, the damaged members (if any), the empty members (if
any), or the trailing data (if any) of one or more regular multimember files
to standard output, or to a file if the option @option{--output} is used. If
more than one file is given, the elements dumped from all the files are
to standard output, or to a file if the option @option{-o} is used. If more
than one file is given, the elements dumped from all the files are
concatenated. If a file does not exist, can't be opened, or is not regular,
lziprecover continues processing the rest of the files. If the dump fails in
one file, lziprecover exits immediately without processing the rest of the
@ -547,7 +559,7 @@ attempting the removal of trailing data.
@item --strip=[@var{member_list}][:damaged][:empty][:tdata]
Copy one or more regular multimember files to standard output (or to a file
if the option @option{--output} is used), stripping the members listed, the
if the option @option{-o} is used), stripping the members listed, the
damaged members (if any), the empty members (if any), or the trailing data
(if any) from each file. If all members in a file are selected to be
stripped, the trailing data (if any) are also stripped even if @samp{tdata}
@ -559,22 +571,11 @@ the rest of the files. If a file fails to copy, lziprecover exits
immediately without processing the rest of the files. See @option{--dump}
above for a description of the argument.
@item --ignore-empty
When decompressing, testing, or listing, ignore empty members in multimember
files. By default lziprecover exits with error status 2 if any empty member
is found in a multimember file.
@item --ignore-nonzero
When decompressing or testing, ignore a nonzero first byte in the LZMA
stream. By default lziprecover exits with error status 2 if the first LZMA
byte is nonzero in any member of the input files.
Use @w{@samp{lziprecover --nonzero-repair}} to repair any such nonzero bytes.
@item --loose-trailing
When decompressing, testing, or listing, allow trailing data whose first
bytes are so similar to the magic bytes of a lzip header that they can
be confused with a corrupt header. Use this option if a file triggers a
"corrupt header" error and the cause is not indeed a corrupt header.
'corrupt header' error and the cause is not indeed a corrupt header.
@item --nonzero-repair
Repair in place a nonzero first LZMA byte in the files specified. With
@ -666,13 +667,14 @@ Load the compressed @var{file} into memory, set the byte at @var{position}
to @var{value}, and decompress the modified compressed data to standard
output. If the damaged member can be decompressed to the end (just fails
with a CRC mismatch), the members following it are also decompressed.
@xref{--set-byte}, for a description of @var{value}.
@item -X[@var{position},@var{value}]
@itemx --show-packets[=@var{position},@var{value}]
Load the compressed @var{file} into memory, optionally set the byte at
@var{position} to @var{value}, decompress the modified compressed data
(discarding the output), and print to standard output descriptions of the
LZMA packets being decoded.
LZMA packets being decoded. @xref{--set-byte}, for a description of @var{value}.
@item -Y @var{range}
@itemx --debug-delay=@var{range}
@ -689,6 +691,7 @@ description of @var{range}.
@itemx --debug-byte-repair=@var{position},@var{value}
Load the compressed @var{file} into memory, set the byte at @var{position}
to @var{value}, and then try to repair the byte error. @xref{--byte-repair}.
@xref{--set-byte}, for a description of @var{value}.
@item --gf16
Forces the use of GF(2^16) when creating FEC blocks even if the number of
@ -723,6 +726,59 @@ indicate a corrupt or invalid input file, 3 for an internal consistency
error (e.g., bug) which caused lziprecover to panic.
@node Argument syntax
@chapter Syntax of command-line arguments
@cindex argument syntax
POSIX recommends these conventions for command-line arguments.
@itemize @bullet
@item A command-line argument is an option if it begins with a hyphen
(@samp{-}).
@item Option names are single alphanumeric characters.
@item Certain options require an argument.
@item An option and its argument may or may not appear as separate tokens.
(In other words, the whitespace separating them is optional, unless the
argument is the empty string).
Thus, @w{@option{-o foo}} and @option{-ofoo} are equivalent.
@item One or more options without arguments, followed by at most one option
that takes an argument, may follow a hyphen in a single token.
Thus, @option{-abc} is equivalent to @w{@option{-a -b -c}}.
@item Options typically precede other non-option arguments.
@item The argument @samp{--} terminates all options; any following arguments
are treated as non-option arguments, even if they begin with a hyphen.
@item A token consisting of a single hyphen character is interpreted as an
ordinary non-option argument. By convention, it is used to specify standard
input, standard output, or a file named @samp{-}.
@end itemize
@noindent
GNU adds @dfn{long options} to these conventions:
@itemize @bullet
@item A long option consists of two hyphens (@samp{--}) followed by a name
made of alphanumeric characters and hyphens. Option names are typically one
to three words long, with hyphens to separate words. Abbreviations can be
used for the long option names as long as the abbreviations are unique.
@item A long option and its argument may or may not appear as separate
tokens. In the latter case they must be separated by an equal sign @samp{=}.
Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent.
@end itemize
@noindent
The syntax of options with an optional argument is
@option{-<short_option><argument>} (without whitespace), or
@option{--<long_option>=<argument>}.
@node File format
@chapter File format
@cindex file format
@ -785,7 +841,7 @@ Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
Valid values for dictionary size range from 4 KiB to 512 MiB.
@item LZMA stream
The LZMA stream, finished by an "End Of Stream" marker. Uses default values
The LZMA stream, terminated by an 'End Of Stream' marker. Uses default values
for encoder properties.
@ifnothtml
@xref{Stream format,,,lzip},
@ -932,12 +988,11 @@ identical backups (@pxref{performance-of-merge}).
@chapter Forward Error Correction
@cindex forward error correction
"Forward Error Correction" (FEC) is any way of protecting data from
corruption by creating redundant data that can be used later to repair
errors in the protected data. Lziprecover uses a Hilbert-based Reed-Solomon
code to create one fec file (with extension @file{.fec}) for each file that
needs to be protected. The fec files created by lziprecover are
reproducible.
Forward Error Correction (FEC) is any way of protecting data from corruption
by creating redundant data that can be used later to repair errors in the
protected data. Lziprecover uses a Hilbert-based Reed-Solomon code to create
one fec file (with extension @file{.fec}) for each file that needs to be
protected. The fec files created by lziprecover are reproducible.
Reed-Solomon is the most space-efficient Error Correcting Code (ECC) for
data stored in block devices. It creates redundant FEC blocks in such a way
@ -945,8 +1000,7 @@ that X FEC blocks allow the recuperation of any combination of up to X lost
data blocks. All the blocks (data and FEC) are of the same size, which in
fec files must be a multiple of 512 bytes. Reed-Solomon is not optimum for
corruption affecting random single bits in a file because each corrupt bit
invalidates the whole block containing it. But in block devices, scattered
bit flips should not happen.
invalidates the whole block containing it.
Usually, a corrupt file does not provide an indication of where the
corruption is located. Therefore, each fec file stores one or two arrays of
@ -1000,8 +1054,7 @@ If we have that x = 1, y = 2, and z = 3, then p = 6, q = 14, and r = 13:
Now, if the values of x and y are lost because of data corruption, they can
be recomputed by using any two of the three equations above. For example, if
we replace the known values of z, p, q, and r in equations (1) and (2) we
get:
we replace the known values of z, p, and q in equations (1) and (2) we get:
@example
x + y + 3 = 6 (1b)
@ -1076,13 +1129,12 @@ missing data blocks.
Lziprecover implements GF(2^8) with polynomial 0x11D and GF(2^16) with
polynomial 0x1100B.
A Hilbert matrix is defined as @w{@samp{A[i][j] = 1 / (i + j + 1)}} for i
and j >= 0. But as in a Galois Field addition is exclusive or, applying the
Hilbert definition produces a singular (non invertible) matrix. To avoid
this problem, lziprecover uses a Hilbert matrix starting at row
@w{@samp{gf_size / 2}}. I.e., @w{@samp{A[i][j] = 1 / (i + gf_size / 2 + j)}}
for @w{@samp{0 <= i,j < gf_size / 2}}. (gf_size is the size of the Galois
Field).
A Hilbert matrix is defined as @w{A[i][j] = 1 / (i + j + 1)} for
@w{i,j >= 0}. But, as in a Galois Field the addition is the exclusive or
operation, applying the Hilbert definition produces a singular (non
invertible) matrix. To avoid this problem, lziprecover uses a Hilbert matrix
starting at row @w{r0 = gf_size / 2}. I.e., @w{A[i][j] = 1 / (i + j + r0)}
for @w{0 <= i,j < r0}. (@samp{gf_size} is the size of the Galois Field).
@node Creating fec files
@ -1113,6 +1165,14 @@ Example 3: Create recursively one fec file for each file in the directory
lziprecover -v -r -Fc -o fec/ datadir
@end example
@noindent
Example 4: Create fec files for a collection of photos stored in directory
@file{photos} and store them in the directory @file{photos-fec}.
@example
lziprecover -v -Fc -o photos-fec/ photos/*
@end example
@node Testing with fec files
@section How to test files using fec files
@ -1143,6 +1203,14 @@ directory @file{fec}.
lziprecover -v -r -Ft --fec-file=fec/ datadir
@end example
@noindent
Example 4: Test the integrity of a collection of photos stored in directory
@file{photos} using fec files from directory @file{photos-fec}.
@example
lziprecover -v -Ft --fec-file=photos-fec/ photos/*
@end example
@node Repairing with fec files
@section How to repair files using fec files
@ -1174,6 +1242,22 @@ directory @file{fec}.
lziprecover -v -r -Fr --fec-file=fec/ datadir
@end example
@anchor{ddrescue-example}
@noindent
Example 4: Recover a collection of photos from a damaged external drive
(@file{/dev/sdc1}). The photos are in directory @file{photos}, and the fec
files are in directory @file{photos-fec}.
@example
ddrescue -b4096 -r10 /dev/sdc1 hdimage mapfile
mount -o loop,ro hdimage /mnt/hdimage
cp -a /mnt/hdimage/photos photos
cp -a /mnt/hdimage/photos-fec photos-fec
umount /mnt/hdimage
lziprecover -v -Fr --fec-file=photos-fec/ photos/*
(Check and rename repaired files. They are named @file{photos/*_fixed})
@end example
@node Fec file format
@section Fec file format
@ -1274,9 +1358,9 @@ The first chksum packet contains an array of CRC32s, while the second chksum
packet (if present) contains an array of CRC32-Cs.
For the expected thousands of bit flips caused by a zeroed sector, a
"symmetric" CRC like CRC32 is probably better than CRC32-C, which detects
all the errors with an odd number of bit flips at the expense of a larger
number of undetected errors with an even number of bit flips.
symmetric CRC like CRC32 is probably better than CRC32-C, which detects all
the errors with an odd number of bit flips at the expense of a larger number
of undetected errors with an even number of bit flips.
@item payload_crc
CRC32 of the crc_array.
@ -1334,9 +1418,9 @@ The file is repaired in memory. Therefore, enough virtual memory
@w{(RAM + swap)} to contain the largest damaged member is required. Member
size is limited to @w{2 GiB} on 32-bit systems.
The error may be located anywhere in the file except in the first 5
bytes of each member header or in the @samp{Member size} field of the
trailer (last 8 bytes of each member). If the error is in the header it
The error may be located anywhere in the file except in the first 5 bytes of
each member header (magic and version) or in the @samp{Member size} field of
the trailer (last 8 bytes of each member). If the error is in the header it
can be easily repaired with a text editor like GNU Moe (@pxref{File
format}). If the error is in the member size, it is enough to ignore the
message about @samp{bad member size} when decompressing.
@ -1349,7 +1433,7 @@ One byte may seem small, but most file corruptions not produced by
transmission errors or I/O errors just affect one byte, or even one bit,
of the file. Also, unlike magnetic media, where errors usually affect a
whole sector, solid-state storage devices tend to produce single-byte
errors, making of lzip the perfect format for data stored on such devices.
errors, which lziprecover can repair.
Repairing a file can take some time. Small files or files with the error
located near the beginning can be repaired in a few seconds. But
@ -1421,19 +1505,10 @@ Note that the number of errors reported by lziprecover (2552) is lower
than the number of corrupt bytes (3104) because contiguous corrupt bytes
are counted as a single multibyte error.
@sp 1
@anchor{ddrescue-example}
@anchor{ddrescue-example2}
@noindent
Example 1: Recover a compressed backup from two copies on CD-ROM with
error-checked merging of copies.
@ifnothtml
@xref{Top,GNU ddrescue manual,,ddrescue},
@end ifnothtml
@ifhtml
See the
@uref{http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html,,ddrescue manual}
@end ifhtml
for details about ddrescue.
@example
ddrescue -d -r1 -b2048 /dev/cdrom cdimage1 mapfile1
@ -1451,7 +1526,6 @@ lziprecover -tv backup.tar.lz
backup.tar.lz: ok
@end example
@sp 1
@noindent
Example 2: Recover the first volume of those created with the command
@w{@samp{lzip -b 32MiB -S 650MB big_db}} from two copies,
@ -1608,8 +1682,7 @@ Member reproduced successfully.
Copy of input file reproduced successfully.
@end example
@sp 1
@anchor{ddrescue-example2}
@anchor{ddrescue-example3}
@noindent
Example 2: Recover a damaged backup with a zeroed sector of 4096 bytes at
file position 1019904, using as reference a previous backup. The damaged
@ -1634,7 +1707,6 @@ Member reproduced successfully.
Copy of input file reproduced successfully.
@end example
@sp 1
@noindent
Example 3: Recover a damaged backup with a zeroed sector of 4096 bytes at
file position 1019904, using as reference a file from the filesystem. (If
@ -1790,7 +1862,7 @@ example when writing to a tape. It is safe to append any amount of
padding zero bytes to a lzip file.
@item
Useful data added by the user; an "End Of File" string (to check that the
Useful data added by the user; an 'End Of File' string (to check that the
file has not been truncated), a cryptographically secure hash, a description
of file contents, etc. It is safe to append any amount of text to a lzip
file as long as none of the first four bytes of the text matches the
@ -1844,7 +1916,6 @@ lziprecover --strip=tdata file.lz > stripped_file.lz
lziprecover --remove=tdata file.lz
@end example
@sp 1
@noindent
Example 2: Add and check a cryptographically secure hash. (This may be
convenient, but a separate copy of the hash must be kept in a safe place
@ -2036,10 +2107,11 @@ The number of N-bit errors per byte (N = 1 to 8) is:
@item -B[@var{size}][,@var{value}]
@itemx --block[=@var{size}][,@var{value}]
Test block errors of given @var{size}, simulating a whole sector I/O error.
@var{size} defaults to 512 bytes. @var{value} defaults to 0. By default,
only contiguous, non-overlapping blocks are tested, but this may be changed
with the option @option{--delta}.
Test block errors of given @var{size}, simulating a whole sector I/O error
by setting all the bytes in the block to @var{value} before attempting
decompression. @var{size} defaults to 512 bytes. @var{value} defaults to 0.
By default, only contiguous, non-overlapping blocks are tested, but this may
be changed with the option @option{--delta}.
@item -d @var{n}
@itemx --delta=@var{n}
@ -2049,6 +2121,7 @@ non-overlapping blocks, or truncation sizes. Values of @var{n} smaller than
the block size result in overlapping blocks. (Which is convenient for
testing because there are usually too few non-overlapping blocks in a file).
@anchor{--set-byte}
@item -e @var{position},@var{value}
@itemx --set-byte=@var{position},@var{value}
Set byte at @var{position} to @var{value} in the internal buffer after