Merging upstream version 1.20.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-21 11:28:50 +01:00 · 2025-02-21 11:28:50 +01:00 · e24aefbbb2
commit e24aefbbb2
parent 72bcf08df5
31 changed files with 1242 additions and 685 deletions
--- a/doc/lziprecover.info
+++ b/doc/lziprecover.info
@ -12,14 +12,14 @@ File: lziprecover.info,  Node: Top,  Next: Introduction,  Up: (dir)
 Lziprecover Manual
 ******************

-This manual is for Lziprecover (version 1.19, 10 April 2017).
+This manual is for Lziprecover (version 1.20, 12 February 2018).

 * Menu:

 * Introduction::           Purpose and features of lziprecover
 * Invoking lziprecover::   Command line interface
 * Data safety::            Protecting data from accidental loss
-* Repairing files::        Fixing bit-flip and similar errors
+* Repairing files::        Fixing bit flips and similar errors
 * Merging files::          Fixing several damaged copies
 * File names::             Names of the files produced by lziprecover
 * File format::            Detailed format of the compressed file
@ -30,7 +30,7 @@ This manual is for Lziprecover (version 1.19, 10 April 2017).
 * Concept index::          Index of concepts


-   Copyright (C) 2009-2017 Antonio Diaz Diaz.
+   Copyright (C) 2009-2018 Antonio Diaz Diaz.

   This manual is free documentation: you have unlimited permission to
 copy, distribute and modify it.
@ -58,7 +58,7 @@ archiving, taking into account both data integrity and decoder
 availability:

   * The lzip format provides very safe integrity checking and some data
-     recovery means. The lziprecover program can repair bit-flip errors
+     recovery means. The lziprecover program can repair bit flip errors
     (one of the most common forms of data corruption) in lzip files,
     and provides data recovery capabilities, including error-checked
     merging of damaged copies of a file. *Note Data safety::.
@ -123,7 +123,7 @@ When decompressing or testing, '-' used as a FILE argument means
 standard input. It can be mixed with other FILES and is read just once,
 the first time it appears in the command line.

-   Lziprecover supports the following options:
+   lziprecover supports the following options:

 '-h'
 '--help'
@ -162,24 +162,25 @@ the first time it appears in the command line.
     Write decompressed data to standard output; keep input files
     unchanged.  This option is needed when reading from a named pipe
     (fifo) or from a device. Use it also to recover as much of the
-     uncompressed data as possible when decompressing a corrupt file.
+     decompressed data as possible when decompressing a corrupt file.

 '-d'
 '--decompress'
-     Decompress the specified file(s). If a file does not exist or
-     can't be opened, lziprecover continues decompressing the rest of
-     the files. If a file fails to decompress, lziprecover exits
-     immediately without decompressing the rest of the files.
+     Decompress the specified files. If a file does not exist or can't
+     be opened, lziprecover continues decompressing the rest of the
+     files. If a file fails to decompress, or is a terminal,
+     lziprecover exits immediately without decompressing the rest of
+     the files.

 '-D RANGE'
 '--range-decompress=RANGE'
     Decompress only a range of bytes starting at decompressed byte
-     position 'BEGIN' and up to byte position 'END - 1'.  This option
-     provides random access to the data in multimember files; it only
-     decompresses the members containing the desired data. In order to
-     guarantee the correctness of the data produced, all members
-     containing any part of the desired data are decompressed and their
-     integrity is verified.
+     position 'BEGIN' and up to byte position 'END - 1'.  Byte
+     positions start at 0. This option provides random access to the
+     data in multimember files; it only decompresses the members
+     containing the desired data. In order to guarantee the correctness
+     of the data produced, all members containing any part of the
+     desired data are decompressed and their integrity is verified.

     Four formats of RANGE are recognized, 'BEGIN', 'BEGIN-END',
     'BEGIN,SIZE', and ',SIZE'. If only BEGIN is specified, END is taken
@ -206,7 +207,7 @@ the first time it appears in the command line.
 '-l'
 '--list'
     Print the uncompressed size, compressed size and percentage saved
-     of the specified file(s). Trailing data are ignored. The values
+     of the specified files. Trailing data are ignored. The values
     produced are correct even for multimember files. If more than one
     file is given, a final line containing the cumulative sizes is
     printed. With '-v', the dictionary size, the number of members in
@ -268,12 +269,13 @@ the first time it appears in the command line.

 '-t'
 '--test'
-     Check integrity of the specified file(s), but don't decompress
-     them.  This really performs a trial decompression and throws away
-     the result.  Use it together with '-v' to see information about
-     the file(s). If a file fails the test, does not exist, can't be
-     opened, or is a terminal, lziprecover continues checking the rest
-     of the files.
+     Check integrity of the specified files, but don't decompress them.
+     This really performs a trial decompression and throws away the
+     result. Use it together with '-v' to see information about the
+     files. If a file fails the test, does not exist, can't be opened,
+     or is a terminal, lziprecover continues checking the rest of the
+     files. A final diagnostic is shown at verbosity level 1 or higher
+     if any file fails the test when testing multiple files.

 '-v'
 '--verbose'
@ -283,10 +285,46 @@ the first time it appears in the command line.
     size, trailer contents (CRC, data size, member size), and up to 6
     bytes of trailing data (if any) both in hexadecimal and as a
     string of printable ASCII characters.
+     Two or more '-v' options show the progress of decompression.
     In other modes, increasing verbosity levels show final status,
     progress of operations, and extra information (for example, the
     failed areas).

+'--loose-trailing'
+     When decompressing, testing or listing, allow trailing data whose
+     first bytes are so similar to the magic bytes of a lzip header
+     that they can be confused with a corrupt header. Use this option
+     if a file triggers a "corrupt header" error and the cause is not
+     indeed a corrupt header.
+
+'--dump-tdata'
+     Dump the trailing data (if any) of one or more regular files to
+     standard output, or to a file if the '--output' option is used. If
+     more than one file is given, the trailing data of all files are
+     concatenated. If a file does not exist, can't be opened, or is not
+     regular, lziprecover continues processing the rest of the files.
+     If the dump fails in one file, lziprecover exits immediately
+     without processing the rest of the files.
+
+'--remove-tdata'
+     Remove the trailing data from regular files in place. The date of
+     each file is preserved if possible. If the removal fails in one
+     file, lziprecover continues processing the rest of the files. This
+     option may be dangerous if the file is corrupt or if the trailing
+     data contain a forbidden combination of characters. *Note Trailing
+     data::. Verify that 'lzip -cd file.lz | wc -c' and the
+     uncompressed size shown by 'lzip -l file.lz' match before
+     attempting the removal.
+
+'--strip-tdata'
+     Copy one or more regular files to standard output (or to a file if
+     the '--output' option is used), stripping the trailing data (if
+     any) from each file. If more than one file is given, the files are
+     concatenated. If a file does not exist, can't be opened, or is not
+     regular, lziprecover continues processing the rest of the files.
+     If a file fails to copy, lziprecover exits immediately without
+     processing the rest of the files.
+

   Numbers given as arguments to options may be followed by a multiplier
 and an optional 'B' for "byte".
@ -336,8 +374,8 @@ scientific data, compressed it, and stored two copies on separate
 media. Years later you notice that both copies are corrupt.

   If you compressed with gzip and both copies suffer any damage in the
-data stream, even if it is just one altered bit, the original data can't
-be recovered.
+data stream, even if it is just one altered bit, the original data can
+only be recovered by an expert, if at all.

   If you used bzip2, and if the file is large enough to contain more
 than one compressed data block (usually larger than 900 kB
@ -363,7 +401,7 @@ Lziprecover can repair perfectly most files with small errors (up to one
 single-byte error per member), without the need of any extra redundance
 at all. If the reparation is successful, the repaired file will be
 identical bit for bit to the original. This makes lzip files resistant
-to bit-flip, one of the most common forms of data corruption.
+to bit flip, one of the most common forms of data corruption.

   The error may be located anywhere in the file except in the first 5
 bytes of each member header or in the 'Member size' field of the
@ -372,9 +410,9 @@ can be easily repaired with a text editor like GNU Moe (*note File
 format::). If the error is in the member size, it is enough to ignore
 the message about 'bad member size' when decompressing.

-   Bit-flip happens when one bit in the file is changed from 0 to 1 or
+   Bit flip happens when one bit in the file is changed from 0 to 1 or
 vice versa. It may be caused by bad RAM or even by natural radiation. I
-have seen a case of bit-flip in a file stored on an USB flash drive.
+have seen a case of bit flip in a file stored on an USB flash drive.

   One byte may seem small, but most file corruptions not produced by
 transmission errors or I/O errors just affect one byte, or even one bit,
@ -547,10 +585,11 @@ member. Such trailing data may be:

   * Useful data added by the user; a cryptographically secure hash, a
     description of file contents, etc. It is safe to append any amount
-     of text to a lzip file as long as the text does not begin with the
-     string "LZIP", and does not contain any zero bytes (null
-     characters). Nonzero bytes and zero bytes can't be safely mixed in
-     trailing data.
+     of text to a lzip file as long as none of the first four bytes of
+     the text match the corresponding byte in the string "LZIP", and
+     the text does not contain any zero bytes (null characters).
+     Nonzero bytes and zero bytes can't be safely mixed in trailing
+     data.

   * Garbage added by some not totally successful copy operation.

@ -558,12 +597,17 @@ member. Such trailing data may be:
     and hash value (for a chosen hash) coincide with those of another
     file.

-   * In very rare cases, trailing data could be the corrupt header of
-     another member. In multimember or concatenated files the
-     probability of corruption happening in the magic bytes is 5 times
-     smaller than the probability of getting a false positive caused by
-     the corruption of the integrity information itself. Therefore it
-     can be considered to be below the noise level.
+   * In rare cases, trailing data could be the corrupt header of another
+     member. In multimember or concatenated files the probability of
+     corruption happening in the magic bytes is 5 times smaller than the
+     probability of getting a false positive caused by the corruption
+     of the integrity information itself. Therefore it can be
+     considered to be below the noise level. Additionally, the test
+     used by lziprecover to discriminate trailing data from a corrupt
+     header has a Hamming distance (HD) of 3, and the 3 bit flips must
+     happen in different magic bytes for the test to fail. In any case,
+     the option '--trailing-error' guarantees that any corrupt header
+     will be detected.

   Trailing data are in no way part of the lzip file format, but tools
 reading lzip files are expected to behave as correctly and usefully as
@ -574,6 +618,30 @@ like that of user-added data, they are expected to be ignored. In those
 cases where a file containing trailing data must be rejected, the option
 '--trailing-error' can be used. *Note --trailing-error::.

+   Lziprecover facilitates the management of metadata stored as trailing
+data in lzip files. See the following examples:
+
+Example 1: Add a comment or description to a compressed file.
+
+     # First append the comment as trailing data to a lzip file
+     echo 'This file contains this and that' >> file.lz
+     # This command prints the comment to standard output
+     lziprecover --dump-tdata file.lz
+     # This command outputs file.lz without the comment
+     lziprecover --strip-tdata file.lz
+     # This command removes the comment from file.lz
+     lziprecover --remove-tdata file.lz
+
+
+Example 2: Add and verify a cryptographically secure hash. (This may be
+convenient, but a separate copy of the hash must be kept in a safe place
+to guarantee that both file and hash have not been maliciously
+replaced).
+
+     sha256sum < file.lz >> file.lz
+     lziprecover --strip-tdata file.lz | sha256sum -c \
+       <(lziprecover --dump-tdata file.lz)
+

 File: lziprecover.info,  Node: Examples,  Next: Unzcrash,  Prev: Trailing data,  Up: Top

@ -674,7 +742,9 @@ lziprecover source directory to build it.

   By default, unzcrash reads the specified file and then repeatedly
 decompresses it, increasing 256 times each byte of the compressed data,
-so as to test all possible one-byte errors.
+so as to test all possible one-byte errors. Note that it may take years
+or even centuries to test all possible one-byte errors in a large file
+(tens of MB).

   If the '--block' option is given, unzcrash reads the specified file
 and then repeatedly decompresses it, setting all bytes in each
@ -711,9 +781,9 @@ by 'zutils'.  *Note Zcmp: (zutils)Zcmp,

   The format for running unzcrash is:

-     unzcrash [OPTIONS] "lzip -tv" FILENAME.lz
+     unzcrash [OPTIONS] 'lzip -t' FILE.lz

-   Unzcrash supports the following options:
+   unzcrash supports the following options:

 '-h'
 '--help'
@ -742,25 +812,35 @@ by 'zutils'.  *Note Zcmp: (zutils)Zcmp,

 '-B[SIZE][,VALUE]'
 '--block[=SIZE][,VALUE]'
-     Test block errors of given SIZE aligned to a SIZE-byte boundary,
-     simulating a whole sector I/O error. Block SIZE defaults to 512
-     bytes. VALUE defaults to 0.
+     Test block errors of given SIZE, simulating a whole sector I/O
+     error. Block SIZE defaults to 512 bytes. VALUE defaults to 0. By
+     default, only blocks aligned to a SIZE-byte boundary are tested,
+     but this may be changed with the '--delta' option.

 '-d N'
 '--delta=N'
-     Test only one of every N bytes, blocks or truncation sizes,
-     instead of all of them.
+     Test only one byte, block, or truncation size every N bytes,
+     instead of all of them. If the '--block' option is given, N
+     defaults to the block size. Else N defaults to 1. Values of N
+     smaller than the block size will result in overlappinng blocks.
+     (Which is convenient for testing because there are usually too few
+     non-overlappinng blocks in a file).

 '-e POSITION,VALUE'
 '--set-byte=POSITION,VALUE'
     Set byte at POSITION to VALUE in the internal buffer after reading
-     and testing FILENAME.lz but before the first test call to the
+     and testing FILE.lz but before the first test call to the
     decompressor. If VALUE is preceded by '+', it is added to the
     original value of the byte at POSITION. If VALUE is preceded by
     'f' (flip), it is XORed with the original value of the byte at
     POSITION. This option can be used to run tests with a changed
     dictionary size, for example.

+'-n'
+'--no-verify'
+     Skip initial verification of FILE.lz and 'zcmp'. May speed up
+     things a lot when testing many (or large) known good files.
+
 '-p BYTES'
 '--position=BYTES'
     First byte position to test in the file. Defaults to 0. Negative
@ -829,29 +909,32 @@ Concept index
 * introduction:                          Introduction.          (line 6)
 * invoking:                              Invoking lziprecover.  (line 6)
 * merging files:                         Merging files.         (line 6)
+* options:                               Invoking lziprecover.  (line 6)
 * repairing files:                       Repairing files.       (line 6)
 * trailing data:                         Trailing data.         (line 6)
 * unzcrash:                              Unzcrash.              (line 6)
+* usage:                                 Invoking lziprecover.  (line 6)
+* version:                               Invoking lziprecover.  (line 6)



 Tag Table:
 Node: Top231
-Node: Introduction1269
-Node: Invoking lziprecover4646
-Ref: --trailing-error5296
-Node: Data safety12788
-Node: Repairing files14712
-Node: Merging files16635
-Node: File names19397
-Node: File format19861
-Node: Trailing data22289
-Node: Examples24195
-Ref: concat-example24626
-Ref: ddrescue-example25727
-Node: Unzcrash27017
-Node: Problems32021
-Node: Concept index32573
+Node: Introduction1273
+Node: Invoking lziprecover4650
+Ref: --trailing-error5300
+Node: Data safety14832
+Node: Repairing files16783
+Node: Merging files18706
+Node: File names21468
+Node: File format21932
+Node: Trailing data24360
+Node: Examples27595
+Ref: concat-example28026
+Ref: ddrescue-example29127
+Node: Unzcrash30417
+Node: Problems36055
+Node: Concept index36607

 End Tag Table