Merging upstream version 1.3~pre1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-24 04:08:02 +01:00 · 2025-02-24 04:08:02 +01:00 · e4e17ab53e
commit e4e17ab53e
parent f04d94e9dd
17 changed files with 387 additions and 259 deletions
--- a/doc/plzip.texi
+++ b/doc/plzip.texi
@ -6,8 +6,8 @@
@finalout
@c %**end of header

-@set UPDATED 29 August 2014
-@set VERSION 1.2
+@set UPDATED 25 November 2014
+@set VERSION 1.3-pre1

@dircategory Data Compression
@direntry
@ -39,6 +39,8 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
 * Program design::         Internal structure of plzip
 * Invoking plzip::         Command line interface
 * File format::            Detailed format of the compressed file
+* Memory requirements::    Memory required to compress and decompress
+* Minimum file sizes::     Minimum file sizes required for full speed
 * Problems::               Reporting bugs
 * Concept index::          Index of concepts
@end menu
@ -60,15 +62,17 @@ the one of lzip, bzip2 or gzip.

 Plzip can compress/decompress large files on multiprocessor machines
 much faster than lzip, at the cost of a slightly reduced compression
-ratio. Note that the number of usable threads is limited by file size;
-on files larger than a few GB plzip can use hundreds of processors, but
-on files of only a few MB plzip is no faster than lzip.
+ratio (0.4 to 2 percent larger compressed files). Note that the number
+of usable threads is limited by file size; on files larger than a few GB
+plzip can use hundreds of processors, but on files of only a few MB
+plzip is no faster than lzip (@pxref{Minimum file sizes}).

 Plzip uses the lzip file format; the files produced by plzip are fully
 compatible with lzip-1.4 or newer, and can be rescued with lziprecover.

-The lzip file format is designed for long-term data archiving, taking
-into account both data integrity and decoder availability:
+The lzip file format is designed for data sharing and long-term
+archiving, taking into account both data integrity and decoder
+availability:

@itemize @bullet
@item
@ -87,8 +91,8 @@ data from a lzip file long after quantum computers eventually render
 LZMA obsolete.

@item
-Additionally lzip is copylefted, which guarantees that it will remain
-free forever.
+Additionally the lzip reference implementation is copylefted, which
+guarantees that it will remain free forever.
@end itemize

 A nice feature of the lzip format is that a corrupt byte is easier to
@ -96,47 +100,15 @@ repair the nearer it is from the beginning of the file. Therefore, with
 the help of lziprecover, losing an entire archive just because of a
 corrupt byte near the beginning is a thing of the past.

-The member trailer stores the 32-bit CRC of the original data, the size
-of the original data and the size of the member. These values, together
-with the value remaining in the range decoder and the end-of-stream
-marker, provide a 4 factor integrity checking which guarantees that the
-decompressed version of the data is identical to the original. This
-guards against corruption of the compressed data, and against undetected
-bugs in plzip (hopefully very unlikely). The chances of data corruption
-going undetected are microscopic. Be aware, though, that the check
-occurs upon decompression, so it can only tell you that something is
-wrong. It can't help you recover the original uncompressed data.
-
 Plzip uses the same well-defined exit status values used by lzip and
 bzip2, which makes it safer than compressors returning ambiguous warning
 values (like gzip) when it is used as a back end for other programs like
 tar or zutils.

-The amount of memory required @strong{per thread} is approximately the
-following:
-
-@itemize @bullet
-@item
-For compression; 3 times the data size (@pxref{--data-size}) plus 11
-times the dictionary size.
-
-@item
-For decompression or testing of a non-seekable file or of standard
-input; 2 times the dictionary size plus up to 32 MiB.
-
-@item
-For decompression of a regular file to a non-seekable file or to
-standard output; the dictionary size plus up to 32 MiB.
-
-@item
-For decompression of a regular file to another regular file, or for
-testing of a regular file; the dictionary size.
-@end itemize
-
 Plzip will automatically use the smallest possible dictionary size for
 each file without exceeding the given limit. Keep in mind that the
 decompression memory requirement is affected at compression time by the
-choice of dictionary size limit.
+choice of dictionary size limit (@pxref{Memory requirements}).

 When compressing, plzip replaces every file given in the command line
 with a compressed version of itself, with the name "original_name.lz".
@ -277,8 +249,8 @@ detect the number of processors in the system and use it as default
 value. @w{@samp{plzip --help}} shows the system's default value.

 Note that the number of usable threads is limited to @w{ceil( file_size
-/ data_size )} during compression (@pxref{--data-size}), and to the
-number of members in the input during decompression.
+/ data_size )} during compression (@pxref{Minimum file sizes}), and to
+the number of members in the input during decompression.

@item -o @var{file}
@itemx --output=@var{file}
@ -315,8 +287,8 @@ Verbose mode.@*
 When compressing, show the compression ratio for each file processed. A
 second @samp{-v} shows the progress of compression.@*
 When decompressing or testing, further -v's (up to 4) increase the
-verbosity level, showing status, compression ratio, decompressed size,
-and compressed size.
+verbosity level, showing status, compression ratio, dictionary size,
+decompressed size, and compressed size.

@item -1 .. -9
 Set the compression parameters (dictionary size and match length limit)
@ -327,8 +299,7 @@ The bidimensional parameter space of LZMA can't be mapped to a linear
 scale optimal for all files. If your files are large, very repetitive,
 etc, you may need to use the @samp{--match-length} and
@samp{--dictionary-size} options directly to achieve optimal
-performance. For example, @samp{-9m64} usually compresses executables
-more (and faster) than @samp{-9}.
+performance.

@multitable {Level} {Dictionary size} {Match length limit}
@item Level @tab Dictionary size @tab Match length limit
@ -449,6 +420,73 @@ facilitates safe recovery of undamaged members from multi-member files.
@end table


+@node Memory requirements
+@chapter Memory required to compress and decompress
+@cindex memory requirements
+
+The amount of memory required @strong{per thread} is approximately the
+following:
+
+@itemize @bullet
+@item
+For compression; 11 times the dictionary size plus 3 times the data size
+(@pxref{--data-size}).
+
+@item
+For decompression of a regular (seekable) file to another regular file,
+or for testing of a regular file; the dictionary size. Note that regular
+files with more than 1024 bytes of trailing garbage are treated as
+non-seekable.
+
+@item
+For testing of a non-seekable file or of standard input; the dictionary
+size plus up to 5 MiB.
+
+@item
+For decompression of a regular file to a non-seekable file or to
+standard output; the dictionary size plus up to 32 MiB.
+
+@item
+For decompression of a non-seekable file or of standard input; the
+dictionary size plus up to 35 MiB.
+@end itemize
+
+
+@node Minimum file sizes
+@chapter Minimum file sizes required for full compression speed
+@cindex minimum file sizes
+
+When compressing, plzip divides the input file into chunks and
+compresses as many chunks simultaneously as worker threads are chosen,
+creating a multi-member compressed file.
+
+For this to work as expected (and roughly multiply the compression speed
+by the number of available processors), the uncompressed file must be at
+least as large as the number of worker threads times the chunk size
+(@pxref{--data-size}). Else some processors will not get any data to
+compress, and compression will be proportionally slower. The maximum
+speed increase achievable on a given file is limited by the ratio
+@w{(file_size / data_size)}.
+
+The following table shows the minimum uncompressed file size needed for
+full use of N processors at a given compression level, using the default
+data size for each level:
+
+@multitable {Processors} {128 MiB} {128 MiB} {128 MiB} {128 MiB} {128 MiB} {128 MiB}
+@headitem Processors @tab 2 @tab 3 @tab 4 @tab 8 @tab 16 @tab 64
+@item Level
+@item -1 @tab   4 MiB @tab   6 MiB @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab 128 MiB
+@item -2 @tab   6 MiB @tab   9 MiB @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab 192 MiB
+@item -3 @tab   8 MiB @tab  12 MiB @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 256 MiB
+@item -4 @tab  12 MiB @tab  18 MiB @tab  24 MiB @tab  48 MiB @tab  96 MiB @tab 384 MiB
+@item -5 @tab  16 MiB @tab  24 MiB @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 512 MiB
+@item -6 @tab  32 MiB @tab  48 MiB @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab   1 GiB
+@item -7 @tab  64 MiB @tab  96 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   2 GiB
+@item -8 @tab  96 MiB @tab 144 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab   3 GiB
+@item -9 @tab 128 MiB @tab 192 MiB @tab 256 MiB @tab 512 MiB @tab   1 GiB @tab   4 GiB
+@end multitable
+
+
@node Problems
@chapter Reporting bugs
@cindex bugs