1
0
Fork 0

Adding upstream version 1.20.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-21 11:28:40 +01:00
parent d7ceba2005
commit df07043ffe
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
31 changed files with 1242 additions and 685 deletions

View file

@ -1,5 +1,5 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
.TH LZIPRECOVER "1" "April 2017" "lziprecover 1.19" "User Commands"
.TH LZIPRECOVER "1" "February 2018" "lziprecover 1.20" "User Commands"
.SH NAME
lziprecover \- recovers data from damaged lzip files
.SH SYNOPSIS
@ -20,6 +20,9 @@ files and test integrity of files.
Lziprecover provides random access to the data in multimember files; it
only decompresses the members containing the desired data.
.PP
Lziprecover facilitates the management of metadata stored as trailing
data in lzip files.
.PP
Lziprecover is not a replacement for regular backups, but a last line of
defense for the case where the backups are also damaged.
.SH OPTIONS
@ -77,6 +80,18 @@ test compressed file integrity
.TP
\fB\-v\fR, \fB\-\-verbose\fR
be verbose (a 2nd \fB\-v\fR gives more)
.TP
\fB\-\-loose\-trailing\fR
allow trailing data seeming corrupt header
.TP
\fB\-\-dump\-tdata\fR
dump trailing data to standard output
.TP
\fB\-\-remove\-tdata\fR
remove trailing data from files in place
.TP
\fB\-\-strip\-tdata\fR
copy files to stdout without trailing data
.PP
If no file names are given, or if a file is '\-', lziprecover decompresses
from standard input to standard output.
@ -92,7 +107,7 @@ Report bugs to lzip\-bug@nongnu.org
.br
Lziprecover home page: http://www.nongnu.org/lzip/lziprecover.html
.SH COPYRIGHT
Copyright \(co 2017 Antonio Diaz Diaz.
Copyright \(co 2018 Antonio Diaz Diaz.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br
This is free software: you are free to change and redistribute it.

View file

@ -12,14 +12,14 @@ File: lziprecover.info, Node: Top, Next: Introduction, Up: (dir)
Lziprecover Manual
******************
This manual is for Lziprecover (version 1.19, 10 April 2017).
This manual is for Lziprecover (version 1.20, 12 February 2018).
* Menu:
* Introduction:: Purpose and features of lziprecover
* Invoking lziprecover:: Command line interface
* Data safety:: Protecting data from accidental loss
* Repairing files:: Fixing bit-flip and similar errors
* Repairing files:: Fixing bit flips and similar errors
* Merging files:: Fixing several damaged copies
* File names:: Names of the files produced by lziprecover
* File format:: Detailed format of the compressed file
@ -30,7 +30,7 @@ This manual is for Lziprecover (version 1.19, 10 April 2017).
* Concept index:: Index of concepts
Copyright (C) 2009-2017 Antonio Diaz Diaz.
Copyright (C) 2009-2018 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute and modify it.
@ -58,7 +58,7 @@ archiving, taking into account both data integrity and decoder
availability:
* The lzip format provides very safe integrity checking and some data
recovery means. The lziprecover program can repair bit-flip errors
recovery means. The lziprecover program can repair bit flip errors
(one of the most common forms of data corruption) in lzip files,
and provides data recovery capabilities, including error-checked
merging of damaged copies of a file. *Note Data safety::.
@ -123,7 +123,7 @@ When decompressing or testing, '-' used as a FILE argument means
standard input. It can be mixed with other FILES and is read just once,
the first time it appears in the command line.
Lziprecover supports the following options:
lziprecover supports the following options:
'-h'
'--help'
@ -162,24 +162,25 @@ the first time it appears in the command line.
Write decompressed data to standard output; keep input files
unchanged. This option is needed when reading from a named pipe
(fifo) or from a device. Use it also to recover as much of the
uncompressed data as possible when decompressing a corrupt file.
decompressed data as possible when decompressing a corrupt file.
'-d'
'--decompress'
Decompress the specified file(s). If a file does not exist or
can't be opened, lziprecover continues decompressing the rest of
the files. If a file fails to decompress, lziprecover exits
immediately without decompressing the rest of the files.
Decompress the specified files. If a file does not exist or can't
be opened, lziprecover continues decompressing the rest of the
files. If a file fails to decompress, or is a terminal,
lziprecover exits immediately without decompressing the rest of
the files.
'-D RANGE'
'--range-decompress=RANGE'
Decompress only a range of bytes starting at decompressed byte
position 'BEGIN' and up to byte position 'END - 1'. This option
provides random access to the data in multimember files; it only
decompresses the members containing the desired data. In order to
guarantee the correctness of the data produced, all members
containing any part of the desired data are decompressed and their
integrity is verified.
position 'BEGIN' and up to byte position 'END - 1'. Byte
positions start at 0. This option provides random access to the
data in multimember files; it only decompresses the members
containing the desired data. In order to guarantee the correctness
of the data produced, all members containing any part of the
desired data are decompressed and their integrity is verified.
Four formats of RANGE are recognized, 'BEGIN', 'BEGIN-END',
'BEGIN,SIZE', and ',SIZE'. If only BEGIN is specified, END is taken
@ -206,7 +207,7 @@ the first time it appears in the command line.
'-l'
'--list'
Print the uncompressed size, compressed size and percentage saved
of the specified file(s). Trailing data are ignored. The values
of the specified files. Trailing data are ignored. The values
produced are correct even for multimember files. If more than one
file is given, a final line containing the cumulative sizes is
printed. With '-v', the dictionary size, the number of members in
@ -268,12 +269,13 @@ the first time it appears in the command line.
'-t'
'--test'
Check integrity of the specified file(s), but don't decompress
them. This really performs a trial decompression and throws away
the result. Use it together with '-v' to see information about
the file(s). If a file fails the test, does not exist, can't be
opened, or is a terminal, lziprecover continues checking the rest
of the files.
Check integrity of the specified files, but don't decompress them.
This really performs a trial decompression and throws away the
result. Use it together with '-v' to see information about the
files. If a file fails the test, does not exist, can't be opened,
or is a terminal, lziprecover continues checking the rest of the
files. A final diagnostic is shown at verbosity level 1 or higher
if any file fails the test when testing multiple files.
'-v'
'--verbose'
@ -283,10 +285,46 @@ the first time it appears in the command line.
size, trailer contents (CRC, data size, member size), and up to 6
bytes of trailing data (if any) both in hexadecimal and as a
string of printable ASCII characters.
Two or more '-v' options show the progress of decompression.
In other modes, increasing verbosity levels show final status,
progress of operations, and extra information (for example, the
failed areas).
'--loose-trailing'
When decompressing, testing or listing, allow trailing data whose
first bytes are so similar to the magic bytes of a lzip header
that they can be confused with a corrupt header. Use this option
if a file triggers a "corrupt header" error and the cause is not
indeed a corrupt header.
'--dump-tdata'
Dump the trailing data (if any) of one or more regular files to
standard output, or to a file if the '--output' option is used. If
more than one file is given, the trailing data of all files are
concatenated. If a file does not exist, can't be opened, or is not
regular, lziprecover continues processing the rest of the files.
If the dump fails in one file, lziprecover exits immediately
without processing the rest of the files.
'--remove-tdata'
Remove the trailing data from regular files in place. The date of
each file is preserved if possible. If the removal fails in one
file, lziprecover continues processing the rest of the files. This
option may be dangerous if the file is corrupt or if the trailing
data contain a forbidden combination of characters. *Note Trailing
data::. Verify that 'lzip -cd file.lz | wc -c' and the
uncompressed size shown by 'lzip -l file.lz' match before
attempting the removal.
'--strip-tdata'
Copy one or more regular files to standard output (or to a file if
the '--output' option is used), stripping the trailing data (if
any) from each file. If more than one file is given, the files are
concatenated. If a file does not exist, can't be opened, or is not
regular, lziprecover continues processing the rest of the files.
If a file fails to copy, lziprecover exits immediately without
processing the rest of the files.
Numbers given as arguments to options may be followed by a multiplier
and an optional 'B' for "byte".
@ -336,8 +374,8 @@ scientific data, compressed it, and stored two copies on separate
media. Years later you notice that both copies are corrupt.
If you compressed with gzip and both copies suffer any damage in the
data stream, even if it is just one altered bit, the original data can't
be recovered.
data stream, even if it is just one altered bit, the original data can
only be recovered by an expert, if at all.
If you used bzip2, and if the file is large enough to contain more
than one compressed data block (usually larger than 900 kB
@ -363,7 +401,7 @@ Lziprecover can repair perfectly most files with small errors (up to one
single-byte error per member), without the need of any extra redundance
at all. If the reparation is successful, the repaired file will be
identical bit for bit to the original. This makes lzip files resistant
to bit-flip, one of the most common forms of data corruption.
to bit flip, one of the most common forms of data corruption.
The error may be located anywhere in the file except in the first 5
bytes of each member header or in the 'Member size' field of the
@ -372,9 +410,9 @@ can be easily repaired with a text editor like GNU Moe (*note File
format::). If the error is in the member size, it is enough to ignore
the message about 'bad member size' when decompressing.
Bit-flip happens when one bit in the file is changed from 0 to 1 or
Bit flip happens when one bit in the file is changed from 0 to 1 or
vice versa. It may be caused by bad RAM or even by natural radiation. I
have seen a case of bit-flip in a file stored on an USB flash drive.
have seen a case of bit flip in a file stored on an USB flash drive.
One byte may seem small, but most file corruptions not produced by
transmission errors or I/O errors just affect one byte, or even one bit,
@ -547,10 +585,11 @@ member. Such trailing data may be:
* Useful data added by the user; a cryptographically secure hash, a
description of file contents, etc. It is safe to append any amount
of text to a lzip file as long as the text does not begin with the
string "LZIP", and does not contain any zero bytes (null
characters). Nonzero bytes and zero bytes can't be safely mixed in
trailing data.
of text to a lzip file as long as none of the first four bytes of
the text match the corresponding byte in the string "LZIP", and
the text does not contain any zero bytes (null characters).
Nonzero bytes and zero bytes can't be safely mixed in trailing
data.
* Garbage added by some not totally successful copy operation.
@ -558,12 +597,17 @@ member. Such trailing data may be:
and hash value (for a chosen hash) coincide with those of another
file.
* In very rare cases, trailing data could be the corrupt header of
another member. In multimember or concatenated files the
probability of corruption happening in the magic bytes is 5 times
smaller than the probability of getting a false positive caused by
the corruption of the integrity information itself. Therefore it
can be considered to be below the noise level.
* In rare cases, trailing data could be the corrupt header of another
member. In multimember or concatenated files the probability of
corruption happening in the magic bytes is 5 times smaller than the
probability of getting a false positive caused by the corruption
of the integrity information itself. Therefore it can be
considered to be below the noise level. Additionally, the test
used by lziprecover to discriminate trailing data from a corrupt
header has a Hamming distance (HD) of 3, and the 3 bit flips must
happen in different magic bytes for the test to fail. In any case,
the option '--trailing-error' guarantees that any corrupt header
will be detected.
Trailing data are in no way part of the lzip file format, but tools
reading lzip files are expected to behave as correctly and usefully as
@ -574,6 +618,30 @@ like that of user-added data, they are expected to be ignored. In those
cases where a file containing trailing data must be rejected, the option
'--trailing-error' can be used. *Note --trailing-error::.
Lziprecover facilitates the management of metadata stored as trailing
data in lzip files. See the following examples:
Example 1: Add a comment or description to a compressed file.
# First append the comment as trailing data to a lzip file
echo 'This file contains this and that' >> file.lz
# This command prints the comment to standard output
lziprecover --dump-tdata file.lz
# This command outputs file.lz without the comment
lziprecover --strip-tdata file.lz
# This command removes the comment from file.lz
lziprecover --remove-tdata file.lz
Example 2: Add and verify a cryptographically secure hash. (This may be
convenient, but a separate copy of the hash must be kept in a safe place
to guarantee that both file and hash have not been maliciously
replaced).
sha256sum < file.lz >> file.lz
lziprecover --strip-tdata file.lz | sha256sum -c \
<(lziprecover --dump-tdata file.lz)

File: lziprecover.info, Node: Examples, Next: Unzcrash, Prev: Trailing data, Up: Top
@ -674,7 +742,9 @@ lziprecover source directory to build it.
By default, unzcrash reads the specified file and then repeatedly
decompresses it, increasing 256 times each byte of the compressed data,
so as to test all possible one-byte errors.
so as to test all possible one-byte errors. Note that it may take years
or even centuries to test all possible one-byte errors in a large file
(tens of MB).
If the '--block' option is given, unzcrash reads the specified file
and then repeatedly decompresses it, setting all bytes in each
@ -711,9 +781,9 @@ by 'zutils'. *Note Zcmp: (zutils)Zcmp,
The format for running unzcrash is:
unzcrash [OPTIONS] "lzip -tv" FILENAME.lz
unzcrash [OPTIONS] 'lzip -t' FILE.lz
Unzcrash supports the following options:
unzcrash supports the following options:
'-h'
'--help'
@ -742,25 +812,35 @@ by 'zutils'. *Note Zcmp: (zutils)Zcmp,
'-B[SIZE][,VALUE]'
'--block[=SIZE][,VALUE]'
Test block errors of given SIZE aligned to a SIZE-byte boundary,
simulating a whole sector I/O error. Block SIZE defaults to 512
bytes. VALUE defaults to 0.
Test block errors of given SIZE, simulating a whole sector I/O
error. Block SIZE defaults to 512 bytes. VALUE defaults to 0. By
default, only blocks aligned to a SIZE-byte boundary are tested,
but this may be changed with the '--delta' option.
'-d N'
'--delta=N'
Test only one of every N bytes, blocks or truncation sizes,
instead of all of them.
Test only one byte, block, or truncation size every N bytes,
instead of all of them. If the '--block' option is given, N
defaults to the block size. Else N defaults to 1. Values of N
smaller than the block size will result in overlappinng blocks.
(Which is convenient for testing because there are usually too few
non-overlappinng blocks in a file).
'-e POSITION,VALUE'
'--set-byte=POSITION,VALUE'
Set byte at POSITION to VALUE in the internal buffer after reading
and testing FILENAME.lz but before the first test call to the
and testing FILE.lz but before the first test call to the
decompressor. If VALUE is preceded by '+', it is added to the
original value of the byte at POSITION. If VALUE is preceded by
'f' (flip), it is XORed with the original value of the byte at
POSITION. This option can be used to run tests with a changed
dictionary size, for example.
'-n'
'--no-verify'
Skip initial verification of FILE.lz and 'zcmp'. May speed up
things a lot when testing many (or large) known good files.
'-p BYTES'
'--position=BYTES'
First byte position to test in the file. Defaults to 0. Negative
@ -829,29 +909,32 @@ Concept index
* introduction: Introduction. (line 6)
* invoking: Invoking lziprecover. (line 6)
* merging files: Merging files. (line 6)
* options: Invoking lziprecover. (line 6)
* repairing files: Repairing files. (line 6)
* trailing data: Trailing data. (line 6)
* unzcrash: Unzcrash. (line 6)
* usage: Invoking lziprecover. (line 6)
* version: Invoking lziprecover. (line 6)

Tag Table:
Node: Top231
Node: Introduction1269
Node: Invoking lziprecover4646
Ref: --trailing-error5296
Node: Data safety12788
Node: Repairing files14712
Node: Merging files16635
Node: File names19397
Node: File format19861
Node: Trailing data22289
Node: Examples24195
Ref: concat-example24626
Ref: ddrescue-example25727
Node: Unzcrash27017
Node: Problems32021
Node: Concept index32573
Node: Introduction1273
Node: Invoking lziprecover4650
Ref: --trailing-error5300
Node: Data safety14832
Node: Repairing files16783
Node: Merging files18706
Node: File names21468
Node: File format21932
Node: Trailing data24360
Node: Examples27595
Ref: concat-example28026
Ref: ddrescue-example29127
Node: Unzcrash30417
Node: Problems36055
Node: Concept index36607

End Tag Table

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 10 April 2017
@set VERSION 1.19
@set UPDATED 12 February 2018
@set VERSION 1.20
@dircategory Data Compression
@direntry
@ -38,7 +38,7 @@ This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}).
* Introduction:: Purpose and features of lziprecover
* Invoking lziprecover:: Command line interface
* Data safety:: Protecting data from accidental loss
* Repairing files:: Fixing bit-flip and similar errors
* Repairing files:: Fixing bit flips and similar errors
* Merging files:: Fixing several damaged copies
* File names:: Names of the files produced by lziprecover
* File format:: Detailed format of the compressed file
@ -50,7 +50,7 @@ This manual is for Lziprecover (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
Copyright @copyright{} 2009-2017 Antonio Diaz Diaz.
Copyright @copyright{} 2009-2018 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission
to copy, distribute and modify it.
@ -79,7 +79,7 @@ availability:
@itemize @bullet
@item
The lzip format provides very safe integrity checking and some data
recovery means. The lziprecover program can repair bit-flip errors (one
recovery means. The lziprecover program can repair bit flip errors (one
of the most common forms of data corruption) in lzip files, and provides
data recovery capabilities, including error-checked merging of damaged
copies of a file. @xref{Data safety}.
@ -111,8 +111,8 @@ the compressors in the lzip family; lzip, plzip, minilzip/lzlib, clzip
and pdlzip.
If the cause of file corruption is damaged media, the combination
@w{GNU ddrescue + lziprecover} is the best option for recovering data
from multiple damaged copies. @xref{ddrescue-example}, for an example.
@w{GNU ddrescue + lziprecover} is the best option for recovering data from
multiple damaged copies. @xref{ddrescue-example}, for an example.
If a file is too damaged for lziprecover to repair it, all the
recoverable data in all members of the file can be extracted with the
@ -139,6 +139,9 @@ undergone the process of decompression.
@node Invoking lziprecover
@chapter Invoking lziprecover
@cindex invoking
@cindex options
@cindex usage
@cindex version
The format for running lziprecover is:
@ -151,7 +154,7 @@ When decompressing or testing, @samp{-} used as a @var{file} argument
means standard input. It can be mixed with other @var{files} and is read
just once, the first time it appears in the command line.
Lziprecover supports the following options:
lziprecover supports the following options:
@table @code
@item -h
@ -191,25 +194,25 @@ lzma-alone file as follows:
@itemx --stdout
Write decompressed data to standard output; keep input files unchanged.
This option is needed when reading from a named pipe (fifo) or from a
device. Use it also to recover as much of the uncompressed data as
device. Use it also to recover as much of the decompressed data as
possible when decompressing a corrupt file.
@item -d
@itemx --decompress
Decompress the specified file(s). If a file does not exist or can't be
opened, lziprecover continues decompressing the rest of the files. If a
file fails to decompress, lziprecover exits immediately without
Decompress the specified files. If a file does not exist or can't be
opened, lziprecover continues decompressing the rest of the files. If a file
fails to decompress, or is a terminal, lziprecover exits immediately without
decompressing the rest of the files.
@item -D @var{range}
@itemx --range-decompress=@var{range}
Decompress only a range of bytes starting at decompressed byte position
@samp{@var{begin}} and up to byte position @w{@samp{@var{end} - 1}}.
This option provides random access to the data in multimember files; it
only decompresses the members containing the desired data. In order to
guarantee the correctness of the data produced, all members containing
any part of the desired data are decompressed and their integrity is
verified.
Byte positions start at 0. This option provides random access to the
data in multimember files; it only decompresses the members containing
the desired data. In order to guarantee the correctness of the data
produced, all members containing any part of the desired data are
decompressed and their integrity is verified.
Four formats of @var{range} are recognized, @samp{@var{begin}},
@samp{@var{begin}-@var{end}}, @samp{@var{begin},@var{size}}, and
@ -237,7 +240,7 @@ Keep (don't delete) input files during decompression.
@item -l
@itemx --list
Print the uncompressed size, compressed size and percentage saved of the
specified file(s). Trailing data are ignored. The values produced are
specified files. Trailing data are ignored. The values produced are
correct even for multimember files. If more than one file is given, a
final line containing the cumulative sizes is printed. With @samp{-v},
the dictionary size, the number of members in the file, and the amount
@ -297,11 +300,13 @@ on the number of members in @samp{@var{file}}.
@item -t
@itemx --test
Check integrity of the specified file(s), but don't decompress them.
This really performs a trial decompression and throws away the result.
Use it together with @samp{-v} to see information about the file(s). If
a file fails the test, does not exist, can't be opened, or is a
terminal, lziprecover continues checking the rest of the files.
Check integrity of the specified files, but don't decompress them. This
really performs a trial decompression and throws away the result. Use it
together with @samp{-v} to see information about the files. If a file
fails the test, does not exist, can't be opened, or is a terminal, lziprecover
continues checking the rest of the files. A final diagnostic is shown at
verbosity level 1 or higher if any file fails the test when testing
multiple files.
@item -v
@itemx --verbose
@ -311,9 +316,43 @@ verbosity level, showing status, compression ratio, dictionary size,
trailer contents (CRC, data size, member size), and up to 6 bytes of
trailing data (if any) both in hexadecimal and as a string of printable
ASCII characters.@*
Two or more @samp{-v} options show the progress of decompression.@*
In other modes, increasing verbosity levels show final status, progress
of operations, and extra information (for example, the failed areas).
@item --loose-trailing
When decompressing, testing or listing, allow trailing data whose first
bytes are so similar to the magic bytes of a lzip header that they can
be confused with a corrupt header. Use this option if a file triggers a
"corrupt header" error and the cause is not indeed a corrupt header.
@item --dump-tdata
Dump the trailing data (if any) of one or more regular files to standard
output, or to a file if the @samp{--output} option is used. If more than
one file is given, the trailing data of all files are concatenated. If a
file does not exist, can't be opened, or is not regular, lziprecover
continues processing the rest of the files. If the dump fails in one
file, lziprecover exits immediately without processing the rest of the
files.
@item --remove-tdata
Remove the trailing data from regular files in place. The date of each
file is preserved if possible. If the removal fails in one file,
lziprecover continues processing the rest of the files. This option may
be dangerous if the file is corrupt or if the trailing data contain a
forbidden combination of characters. @xref{Trailing data}. Verify that
@w{@samp{lzip -cd file.lz | wc -c}} and the uncompressed size shown by
@w{@samp{lzip -l file.lz}} match before attempting the removal.
@item --strip-tdata
Copy one or more regular files to standard output (or to a file if the
@samp{--output} option is used), stripping the trailing data (if any)
from each file. If more than one file is given, the files are
concatenated. If a file does not exist, can't be opened, or is not
regular, lziprecover continues processing the rest of the files. If a
file fails to copy, lziprecover exits immediately without processing the
rest of the files.
@end table
Numbers given as arguments to options may be followed by a multiplier
@ -365,12 +404,12 @@ compressed it, and stored two copies on separate media. Years later you
notice that both copies are corrupt.
If you compressed with gzip and both copies suffer any damage in the
data stream, even if it is just one altered bit, the original data can't
be recovered.
data stream, even if it is just one altered bit, the original data can
only be recovered by an expert, if at all.
If you used bzip2, and if the file is large enough to contain more than
one compressed data block (usually larger than 900 kB uncompressed), and
if no block is damaged in both files, then the data can be manually
one compressed data block (usually larger than @w{900 kB} uncompressed),
and if no block is damaged in both files, then the data can be manually
recovered by splitting the files with bzip2recover, verifying every
block and then copying the right blocks in the right order into another
file.
@ -391,7 +430,7 @@ Lziprecover can repair perfectly most files with small errors (up to one
single-byte error per member), without the need of any extra redundance
at all. If the reparation is successful, the repaired file will be
identical bit for bit to the original. This makes lzip files resistant
to bit-flip, one of the most common forms of data corruption.
to bit flip, one of the most common forms of data corruption.
The error may be located anywhere in the file except in the first 5
bytes of each member header or in the @samp{Member size} field of the
@ -400,9 +439,9 @@ can be easily repaired with a text editor like GNU Moe (@pxref{File
format}). If the error is in the member size, it is enough to ignore the
message about @samp{bad member size} when decompressing.
Bit-flip happens when one bit in the file is changed from 0 to 1 or vice
Bit flip happens when one bit in the file is changed from 0 to 1 or vice
versa. It may be caused by bad RAM or even by natural radiation. I have
seen a case of bit-flip in a file stored on an USB flash drive.
seen a case of bit flip in a file stored on an USB flash drive.
One byte may seem small, but most file corruptions not produced by
transmission errors or I/O errors just affect one byte, or even one bit,
@ -463,7 +502,7 @@ into clusters and then merging the files as if each cluster were a
single error.
Here is a real case of successful merging. Two copies of the file
@samp{icecat-3.5.3-x86.tar.lz} (compressed size 9 MB) became corrupt
@samp{icecat-3.5.3-x86.tar.lz} (compressed size @w{9 MB}) became corrupt
while stored on the same NAND flash device. One of the copies had 76
single-bit errors scattered in an area of 1020 bytes, and the other had
3028 such errors in an area of 31729 bytes. Lziprecover produced a
@ -592,9 +631,10 @@ padding zero bytes to a lzip file.
@item
Useful data added by the user; a cryptographically secure hash, a
description of file contents, etc. It is safe to append any amount of
text to a lzip file as long as the text does not begin with the string
"LZIP", and does not contain any zero bytes (null characters). Nonzero
bytes and zero bytes can't be safely mixed in trailing data.
text to a lzip file as long as none of the first four bytes of the text
match the corresponding byte in the string "LZIP", and the text does not
contain any zero bytes (null characters). Nonzero bytes and zero bytes
can't be safely mixed in trailing data.
@item
Garbage added by some not totally successful copy operation.
@ -604,12 +644,16 @@ Malicious data added to the file in order to make its total size and
hash value (for a chosen hash) coincide with those of another file.
@item
In very rare cases, trailing data could be the corrupt header of another
In rare cases, trailing data could be the corrupt header of another
member. In multimember or concatenated files the probability of
corruption happening in the magic bytes is 5 times smaller than the
probability of getting a false positive caused by the corruption of the
integrity information itself. Therefore it can be considered to be below
the noise level.
the noise level. Additionally, the test used by lziprecover to discriminate
trailing data from a corrupt header has a Hamming distance (HD) of 3,
and the 3 bit flips must happen in different magic bytes for the test to
fail. In any case, the option @samp{--trailing-error} guarantees that
any corrupt header will be detected.
@end itemize
Trailing data are in no way part of the lzip file format, but tools
@ -621,6 +665,35 @@ that of user-added data, they are expected to be ignored. In those cases
where a file containing trailing data must be rejected, the option
@samp{--trailing-error} can be used. @xref{--trailing-error}.
Lziprecover facilitates the management of metadata stored as trailing
data in lzip files. See the following examples:
@noindent
Example 1: Add a comment or description to a compressed file.
@example
# First append the comment as trailing data to a lzip file
echo 'This file contains this and that' >> file.lz
# This command prints the comment to standard output
lziprecover --dump-tdata file.lz
# This command outputs file.lz without the comment
lziprecover --strip-tdata file.lz
# This command removes the comment from file.lz
lziprecover --remove-tdata file.lz
@end example
@sp 1
@noindent
Example 2: Add and verify a cryptographically secure hash. (This may be
convenient, but a separate copy of the hash must be kept in a safe place
to guarantee that both file and hash have not been maliciously replaced).
@example
sha256sum < file.lz >> file.lz
lziprecover --strip-tdata file.lz | sha256sum -c \
<(lziprecover --dump-tdata file.lz)
@end example
@node Examples
@chapter A small tutorial with examples
@ -658,7 +731,7 @@ Do this instead
@sp 1
@noindent
Example 4: Decompress @samp{file.lz} partially until 10 KiB of
Example 4: Decompress @samp{file.lz} partially until @w{10 KiB} of
decompressed data are produced.
@example
@ -756,7 +829,9 @@ lziprecover source directory to build it.
By default, unzcrash reads the specified file and then repeatedly
decompresses it, increasing 256 times each byte of the compressed data,
so as to test all possible one-byte errors.
so as to test all possible one-byte errors. Note that it may take years
or even centuries to test all possible one-byte errors in a large file
(tens of MB).
If the @code{--block} option is given, unzcrash reads the specified file
and then repeatedly decompresses it, setting all bytes in each
@ -801,10 +876,10 @@ See
The format for running unzcrash is:
@example
unzcrash [@var{options}] "lzip -tv" @var{filename}.lz
unzcrash [@var{options}] 'lzip -t' @var{file}.lz
@end example
Unzcrash supports the following options:
unzcrash supports the following options:
@table @code
@item -h
@ -835,24 +910,34 @@ The number of N-bit errors per byte (N = 1 to 8) is:
@item -B[@var{size}][,@var{value}]
@itemx --block[=@var{size}][,@var{value}]
Test block errors of given @var{size} aligned to a @var{size}-byte
boundary, simulating a whole sector I/O error. Block @var{size} defaults
to 512 bytes. @var{value} defaults to 0.
Test block errors of given @var{size}, simulating a whole sector I/O
error. Block @var{size} defaults to 512 bytes. @var{value} defaults to
0. By default, only blocks aligned to a @var{size}-byte boundary are
tested, but this may be changed with the @code{--delta} option.
@item -d @var{n}
@itemx --delta=@var{n}
Test only one of every @var{n} bytes, blocks or truncation sizes,
instead of all of them.
Test only one byte, block, or truncation size every @var{n} bytes,
instead of all of them. If the @code{--block} option is given, @var{n}
defaults to the block size. Else @var{n} defaults to 1. Values of
@var{n} smaller than the block size will result in overlappinng blocks.
(Which is convenient for testing because there are usually too few
non-overlappinng blocks in a file).
@item -e @var{position},@var{value}
@itemx --set-byte=@var{position},@var{value}
Set byte at @var{position} to @var{value} in the internal buffer after
reading and testing @var{filename}.lz but before the first test call to
the decompressor. If @var{value} is preceded by @samp{+}, it is added to
the original value of the byte at @var{position}. If @var{value} is
preceded by @samp{f} (flip), it is XORed with the original value of the
byte at @var{position}. This option can be used to run tests with a
changed dictionary size, for example.
reading and testing @var{file}.lz but before the first test call to the
decompressor. If @var{value} is preceded by @samp{+}, it is added to the
original value of the byte at @var{position}. If @var{value} is preceded
by @samp{f} (flip), it is XORed with the original value of the byte at
@var{position}. This option can be used to run tests with a changed
dictionary size, for example.
@item -n
@itemx --no-verify
Skip initial verification of @var{file}.lz and @samp{zcmp}. May speed up
things a lot when testing many (or large) known good files.
@item -p @var{bytes}
@itemx --position=@var{bytes}