Merging upstream version 1.5.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-24 04:12:55 +01:00 · 2025-02-24 04:12:55 +01:00 · 66060d80f9
commit 66060d80f9
parent 5e1f92d2a0
20 changed files with 632 additions and 272 deletions
--- a/doc/plzip.texi
+++ b/doc/plzip.texi
@ -6,8 +6,8 @@
@finalout
@c %**end of header

-@set UPDATED 9 July 2015
-@set VERSION 1.4
+@set UPDATED 14 May 2016
+@set VERSION 1.5

@dircategory Data Compression
@direntry
@ -41,12 +41,14 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
 * File format::            Detailed format of the compressed file
 * Memory requirements::    Memory required to compress and decompress
 * Minimum file sizes::     Minimum file sizes required for full speed
+* Trailing data::          Extra data appended to the file
+* Examples::               A small tutorial with examples
 * Problems::               Reporting bugs
 * Concept index::          Index of concepts
@end menu

@sp 1
-Copyright @copyright{} 2009-2015 Antonio Diaz Diaz.
+Copyright @copyright{} 2009-2016 Antonio Diaz Diaz.

 This manual is free documentation: you have unlimited permission
 to copy, distribute and modify it.
@ -83,7 +85,7 @@ program can repair bit-flip errors (one of the most common forms of data
 corruption) in lzip files, and provides data recovery capabilities,
 including error-checked merging of damaged copies of a file.
@ifnothtml
-@ref{Data safety,,,lziprecover}.
+@xref{Data safety,,,lziprecover}.
@end ifnothtml

@item
@ -144,13 +146,6 @@ or more compressed files. The result is the concatenation of the
 corresponding uncompressed files. Integrity testing of concatenated
 compressed files is also supported.

-WARNING! Even if plzip is bug-free, other causes may result in a corrupt
-compressed file (bugs in the system libraries, memory errors, etc).
-Therefore, if the data you are going to compress are important, give the
-@samp{--keep} option to plzip and do not remove the original file until
-you verify the compressed file with a command like
-@w{@samp{plzip -cd file.lz | cmp file -}}.
-

@node Invoking plzip
@chapter Invoking plzip
@ -165,6 +160,11 @@ The format for running plzip is:
 plzip [@var{options}] [@var{files}]
@end example

+@noindent
+@samp{-} used as a @var{file} argument means standard input. It can be
+mixed with other @var{files} and is read just once, the first time it
+appears in the command line.
+
 Plzip supports the following options:

@table @code
@ -176,6 +176,13 @@ Print an informative help message describing the options and exit.
@itemx --version
 Print the version number of plzip on the standard output and exit.

+@anchor{--trailing-error}
+@item -a
+@itemx --trailing-error
+Exit with error status 2 if any remaining input is detected after
+decompressing the last member. Such remaining input is usually trailing
+garbage that can be safely ignored. @xref{concat-example}.
+
@anchor{--data-size}
@item -B @var{bytes}
@itemx --data-size=@var{bytes}
@ -188,12 +195,17 @@ data size.

@item -c
@itemx --stdout
-Compress or decompress to standard output. Needed when reading from a
-named pipe (fifo) or from a device.
+Compress or decompress to standard output; keep input files unchanged.
+If compressing several files, each file is compressed independently.
+This option is needed when reading from a named pipe (fifo) or from a
+device.

@item -d
@itemx --decompress
-Decompress.
+Decompress the specified file(s). If a file does not exist or can't be
+opened, plzip continues decompressing the rest of the files. If a file
+fails to decompress, plzip exits immediately without decompressing the
+rest of the files.

@item -f
@itemx --force
@ -238,11 +250,13 @@ Quiet operation. Suppress all messages.

@item -s @var{bytes}
@itemx --dictionary-size=@var{bytes}
-Set the dictionary size limit in bytes. Valid values range from 4 KiB to
-512 MiB. Plzip will use the smallest possible dictionary size for each
-file without exceeding this limit. Note that dictionary sizes are
-quantized. If the specified size does not match one of the valid sizes,
-it will be rounded upwards by adding up to (@var{bytes} / 16) to it.
+Set the dictionary size limit in bytes. Plzip will use the smallest
+possible dictionary size for each file without exceeding this limit.
+Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are
+interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that
+dictionary sizes are quantized. If the specified size does not match one
+of the valid sizes, it will be rounded upwards by adding up to
+@w{(@var{bytes} / 8)} to it.

 For maximum compression you should use a dictionary size limit as large
 as possible, but keep in mind that the decompression memory requirement
@ -252,7 +266,9 @@ is affected at compression time by the choice of dictionary size limit.
@itemx --test
 Check integrity of the specified file(s), but don't decompress them.
 This really performs a trial decompression and throws away the result.
-Use it together with @samp{-v} to see information about the file.
+Use it together with @samp{-v} to see information about the file(s). If
+a file fails the test, plzip may be unable to check the rest of the
+files.

@item -v
@itemx --verbose
@ -265,14 +281,14 @@ decompressed size, and compressed size.

@item -0 .. -9
 Set the compression parameters (dictionary size and match length limit)
-as shown in the table below. Note that @samp{-9} can be much slower than
-@samp{-0}. These options have no effect when decompressing.
+as shown in the table below. The default compression level is @samp{-6}.
+Note that @samp{-9} can be much slower than @samp{-0}. These options
+have no effect when decompressing.

 The bidimensional parameter space of LZMA can't be mapped to a linear
 scale optimal for all files. If your files are large, very repetitive,
-etc, you may need to use the @samp{--match-length} and
-@samp{--dictionary-size} options directly to achieve optimal
-performance.
+etc, you may need to use the @samp{--dictionary-size} and
+@samp{--match-length} options directly to achieve optimal performance.

@multitable {Level} {Dictionary size} {Match length limit}
@item Level @tab Dictionary size @tab Match length limit
@ -324,7 +340,7 @@ caused plzip to panic.

 When compressing, plzip divides the input file into chunks and
 compresses as many chunks simultaneously as worker threads are chosen,
-creating a multi-member compressed file.
+creating a multimember compressed file.

 When decompressing, plzip decompresses as many members simultaneously as
 worker threads are chosen. Files that were compressed with lzip will not
@ -383,14 +399,14 @@ additional information before, between, or after them.
 Each member has the following structure:
@verbatim
 +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-| ID string | VN | DS | Lzma stream | CRC32 |   Data size   |  Member size  |
+| ID string | VN | DS | LZMA stream | CRC32 |   Data size   |  Member size  |
 +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@end verbatim

 All multibyte values are stored in little endian order.

@table @samp
-@item ID string
+@item ID string (the "magic" bytes)
 A four byte string, identifying the lzip format, with the value "LZIP"
 (0x4C, 0x5A, 0x49, 0x50).

@ -407,8 +423,8 @@ from the base size to obtain the dictionary size.@*
 Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
 Valid values for dictionary size range from 4 KiB to 512 MiB.

-@item Lzma stream
-The lzma stream, finished by an end of stream marker. Uses default
+@item LZMA stream
+The LZMA stream, finished by an end of stream marker. Uses default
 values for encoder properties.
@ifnothtml
@xref{Stream format,,,lzip},
@ -428,7 +444,7 @@ Size of the uncompressed original data.
@item Member size (8 bytes)
 Total size of the member, including header and trailer. This field acts
 as a distributed index, allows the verification of stream integrity, and
-facilitates safe recovery of undamaged members from multi-member files.
+facilitates safe recovery of undamaged members from multimember files.

@end table

@ -453,8 +469,8 @@ times the data size. Default is 136 MiB.
 For decompression of a regular (seekable) file to another regular file,
 or for testing of a regular file; the dictionary size.

-(Note that regular files with more than 1024 bytes of trailing garbage
-are treated as non-seekable).
+(Note that regular files with more than 1024 bytes of trailing data are
+treated as non-seekable).

@item
 For testing of a non-seekable file or of standard input; the dictionary
@ -476,7 +492,7 @@ dictionary size plus up to 35 MiB.

 When compressing, plzip divides the input file into chunks and
 compresses as many chunks simultaneously as worker threads are chosen,
-creating a multi-member compressed file.
+creating a multimember compressed file.

 For this to work as expected (and roughly multiply the compression speed
 by the number of available processors), the uncompressed file must be at
@ -506,6 +522,133 @@ data size for each level:
@end multitable


+@node Trailing data
+@chapter Extra data appended to the file
+@cindex trailing data
+
+Sometimes extra data is found appended to a lzip file after the last
+member. Such trailing data may be:
+
+@itemize @bullet
+@item
+Padding added to make the file size a multiple of some block size, for
+example when writing to a tape.
+
+@item
+Garbage added by some not totally successful copy operation.
+
+@item
+Useful data added by the user; a cryptographically secure hash, a
+description of file contents, etc.
+
+@item
+Malicious data added to the file in order to make its total size and
+hash value (for a chosen hash) coincide with those of another file.
+
+@item
+In very rare cases, trailing data could be the corrupt header of another
+member. In multimember or concatenated files the probability of
+corruption happening in the magic bytes is 5 times smaller than the
+probability of getting a false positive caused by the corruption of the
+integrity information itself. Therefore it can be considered to be below
+the noise level.
+@end itemize
+
+Trailing data can be safely ignored in most cases. In some cases, like
+that of user-added data, it is expected to be ignored. In those cases
+where a file containing trailing data must be rejected, the option
+@samp{--trailing-error} can be used. @xref{--trailing-error}.
+
+
+@node Examples
+@chapter A small tutorial with examples
+@cindex examples
+
+WARNING! Even if plzip is bug-free, other causes may result in a corrupt
+compressed file (bugs in the system libraries, memory errors, etc).
+Therefore, if the data you are going to compress are important, give the
+@samp{--keep} option to plzip and don't remove the original file until
+you verify the compressed file with a command like
+@w{@samp{plzip -cd file.lz | cmp file -}}.
+
+@sp 1
+@noindent
+Example 1: Replace a regular file with its compressed version
+@samp{file.lz} and show the compression ratio.
+
+@example
+plzip -v file
+@end example
+
+@sp 1
+@noindent
+Example 2: Like example 1 but the created @samp{file.lz} has a block
+size of 1 MiB. The compression ratio is not shown.
+
+@example
+plzip -B 1MiB file
+@end example
+
+@sp 1
+@noindent
+Example 3: Restore a regular file from its compressed version
+@samp{file.lz}. If the operation is successful, @samp{file.lz} is
+removed.
+
+@example
+plzip -d file.lz
+@end example
+
+@sp 1
+@noindent
+Example 4: Verify the integrity of the compressed file @samp{file.lz}
+and show status.
+
+@example
+plzip -tv file.lz
+@end example
+
+@sp 1
+@noindent
+Example 5: Compress a whole device in /dev/sdc and send the output to
+@samp{file.lz}.
+
+@example
+plzip -c /dev/sdc > file.lz
+@end example
+
+@sp 1
+@anchor{concat-example}
+@noindent
+Example 6: The right way of concatenating compressed files.
+@xref{Trailing data}.
+
+@example
+Don't do this
+  cat file1.lz file2.lz file3.lz | plzip -d
+Do this instead
+  plzip -cd file1.lz file2.lz file3.lz
+@end example
+
+@sp 1
+@noindent
+Example 7: Decompress @samp{file.lz} partially until 10 KiB of
+decompressed data are produced.
+
+@example
+plzip -cd file.lz | dd bs=1024 count=10
+@end example
+
+@sp 1
+@noindent
+Example 8: Decompress @samp{file.lz} partially from decompressed byte
+10000 to decompressed byte 15000 (5000 bytes are produced).
+
+@example
+plzip -cd file.lz | dd bs=1000 skip=10 count=5
+@end example
+
+
@node Problems
@chapter Reporting bugs
@cindex bugs