1
0
Fork 0

Merging upstream version 1.8.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-24 04:16:09 +01:00
parent 95e76700ee
commit 3ab3342c4f
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
21 changed files with 729 additions and 460 deletions

View file

@ -6,19 +6,19 @@
@finalout
@c %**end of header
@set UPDATED 7 February 2018
@set VERSION 1.7
@set UPDATED 5 January 2019
@set VERSION 1.8
@dircategory Data Compression
@direntry
* Plzip: (plzip). Parallel compressor compatible with lzip
* Plzip: (plzip). Massively parallel implementation of lzip
@end direntry
@ifnothtml
@titlepage
@title Plzip
@subtitle Parallel compressor compatible with lzip
@subtitle Massively parallel implementation of lzip
@subtitle for Plzip version @value{VERSION}, @value{UPDATED}
@author by Antonio Diaz Diaz
@ -49,7 +49,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
Copyright @copyright{} 2009-2018 Antonio Diaz Diaz.
Copyright @copyright{} 2009-2019 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission
to copy, distribute and modify it.
@ -59,23 +59,28 @@ to copy, distribute and modify it.
@chapter Introduction
@cindex introduction
Plzip is a massively parallel (multi-threaded) lossless data compressor
based on the lzlib compression library, with a user interface similar to
the one of lzip, bzip2 or gzip.
@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip} is a massively parallel
(multi-threaded) implementation of lzip, fully compatible with lzip 1.4 or
newer. Plzip uses the lzlib compression library.
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} is a lossless data
compressor with a user interface similar to the one of gzip or bzip2. Lzip
can compress about as fast as gzip @w{(lzip -0)} or compress most files more
than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between gzip
and bzip2. Lzip is better than gzip and bzip2 from a data recovery
perspective. Lzip has been designed, written and tested with great care to
replace gzip and bzip2 as the standard general-purpose compressed format for
unix-like systems.
Plzip can compress/decompress large files on multiprocessor machines
much faster than lzip, at the cost of a slightly reduced compression
ratio (0.4 to 2 percent larger compressed files). Note that the number
of usable threads is limited by file size; on files larger than a few GB
plzip can use hundreds of processors, but on files of only a few MB
plzip is no faster than lzip (@pxref{Minimum file sizes}).
plzip is no faster than lzip. @xref{Minimum file sizes}.
Plzip uses the lzip file format; the files produced by plzip are fully
compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
The lzip file format is designed for data sharing and long-term
archiving, taking into account both data integrity and decoder
availability:
The lzip file format is designed for data sharing and long-term archiving,
taking into account both data integrity and decoder availability:
@itemize @bullet
@item
@ -107,15 +112,14 @@ repair the nearer it is from the beginning of the file. Therefore, with
the help of lziprecover, losing an entire archive just because of a
corrupt byte near the beginning is a thing of the past.
Plzip uses the same well-defined exit status values used by lzip and
bzip2, which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs like
tar or zutils.
Plzip uses the same well-defined exit status values used by lzip, which
makes it safer than compressors returning ambiguous warning values (like
gzip) when it is used as a back end for other programs like tar or zutils.
Plzip will automatically use the smallest possible dictionary size for
each file without exceeding the given limit. Keep in mind that the
decompression memory requirement is affected at compression time by the
choice of dictionary size limit (@pxref{Memory requirements}).
Plzip will automatically use for each file the largest dictionary size that
does not exceed neither the file size nor the limit given. Keep in mind that
the decompression memory requirement is affected at compression time by the
choice of dictionary size limit. @xref{Memory requirements}.
When compressing, plzip replaces every file given in the command line
with a compressed version of itself, with the name "original_name.lz".
@ -130,7 +134,7 @@ file from that of the compressed file as follows:
(De)compressing a file is much like copying or moving it; therefore plzip
preserves the access and modification dates, permissions, and, when
possible, ownership of the file just as "cp -p" does. (If the user ID or
possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared).
@ -142,10 +146,10 @@ standard input to standard output. In this case, plzip will decline to
write compressed output to a terminal, as this would be entirely
incomprehensible and therefore pointless.
Plzip will correctly decompress a file which is the concatenation of two
or more compressed files. The result is the concatenation of the
corresponding decompressed files. Integrity testing of concatenated
compressed files is also supported.
Plzip will correctly decompress a file which is the concatenation of two or
more compressed files. The result is the concatenation of the corresponding
decompressed files. Integrity testing of concatenated compressed files is
also supported.
@node Output
@ -225,6 +229,7 @@ Print an informative help message describing the options and exit.
@item -V
@itemx --version
Print the version number of plzip on the standard output and exit.
This version number should be included in all bug reports.
@anchor{--trailing-error}
@item -a
@ -322,12 +327,13 @@ Quiet operation. Suppress all messages.
@item -s @var{bytes}
@itemx --dictionary-size=@var{bytes}
When compressing, set the dictionary size limit in bytes. Plzip will use
the smallest possible dictionary size for each file without exceeding
this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
that dictionary sizes are quantized. If the specified size does not
match one of the valid sizes, it will be rounded upwards by adding up to
@w{(@var{bytes} / 8)} to it.
for each file the largest dictionary size that does not exceed neither
the file size nor this limit. Valid values range from @w{4 KiB} to
@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
coded in just one byte (@pxref{coded-dict-size}). If the specified size
does not match one of the valid sizes, it will be rounded upwards by
adding up to @w{(@var{bytes} / 8)} to it.
For maximum compression you should use a dictionary size limit as large
as possible, but keep in mind that the decompression memory requirement
@ -354,18 +360,23 @@ Two or more @samp{-v} options show the progress of (de)compression,
except for single-member files.
@item -0 .. -9
Set the compression parameters (dictionary size and match length limit)
as shown in the table below. The default compression level is @samp{-6}.
Note that @samp{-9} can be much slower than @samp{-0}. These options
have no effect when decompressing, testing or listing.
Compression level. Set the compression parameters (dictionary size and
match length limit) as shown in the table below. The default compression
level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that
@samp{-9} can be much slower than @samp{-0}. These options have no
effect when decompressing, testing or listing.
The bidimensional parameter space of LZMA can't be mapped to a linear
scale optimal for all files. If your files are large, very repetitive,
etc, you may need to use the @samp{--dictionary-size} and
@samp{--match-length} options directly to achieve optimal performance.
@multitable {Level} {Dictionary size} {Match length limit}
@item Level @tab Dictionary size @tab Match length limit
If several compression levels or @samp{-s} or @samp{-m} options are
given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is
equivalent to @w{@samp{-s64MiB -m273}}
@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)}
@item Level @tab Dictionary size (-s) @tab Match length limit (-m)
@item -0 @tab 64 KiB @tab 16 bytes
@item -1 @tab 1 MiB @tab 5 bytes
@item -2 @tab 1.5 MiB @tab 6 bytes
@ -388,6 +399,18 @@ bytes are so similar to the magic bytes of a lzip header that they can
be confused with a corrupt header. Use this option if a file triggers a
"corrupt header" error and the cause is not indeed a corrupt header.
@item --in-slots=@var{n}
Number of @w{1 MiB} input packets buffered per worker thread when
decompressing from non-seekable input. Increasing the number of packets
may increase decompression speed, but requires more memory. Valid values
range from 1 to 64. The default value is 4.
@item --out-slots=@var{n}
Number of @w{1 MiB} output packets buffered per worker thread when
decompressing to non-seekable output. Increasing the number of packets
may increase decompression speed, but requires more memory. Valid values
range from 1 to 1024. The default value is 64.
@end table
Numbers given as arguments to options may be followed by a multiplier
@ -506,12 +529,13 @@ A four byte string, identifying the lzip format, with the value "LZIP"
@item VN (version number, 1 byte)
Just in case something needs to be modified in the future. 1 for now.
@anchor{coded-dict-size}
@item DS (coded dictionary size, 1 byte)
The dictionary size is calculated by taking a power of 2 (the base size)
and substracting from it a fraction between 0/16 and 7/16 of the base
and subtracting from it a fraction between 0/16 and 7/16 of the base
size.@*
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
Bits 7-5 contain the numerator of the fraction (0 to 7) to substract
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
from the base size to obtain the dictionary size.@*
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
Valid values for dictionary size range from 4 KiB to 512 MiB.
@ -546,8 +570,8 @@ facilitates safe recovery of undamaged members from multimember files.
@chapter Memory required to compress and decompress
@cindex memory requirements
The amount of memory required @strong{per thread} for decompression or
testing is approximately the following:
The amount of memory required @strong{per worker thread} for
decompression or testing is approximately the following:
@itemize @bullet
@item
@ -556,20 +580,23 @@ or for testing of a regular file; the dictionary size.
@item
For testing of a non-seekable file or of standard input; the dictionary
size plus up to @w{5 MiB}.
size plus @w{1 MiB} plus up to the number of @w{1 MiB} input packets
buffered (4 by default).
@item
For decompression of a regular file to a non-seekable file or to
standard output; the dictionary size plus up to @w{32 MiB}.
standard output; the dictionary size plus up to the number of @w{1 MiB}
output packets buffered (64 by default).
@item
For decompression of a non-seekable file or of standard input; the
dictionary size plus up to @w{35 MiB}.
dictionary size plus @w{1 MiB} plus up to the number of @w{1 MiB} input
and output packets buffered (68 by default).
@end itemize
@noindent
The amount of memory required @strong{per thread} for compression is
approximately the following:
The amount of memory required @strong{per worker thread} for compression
is approximately the following:
@itemize @bullet
@item
@ -696,9 +723,12 @@ where a file containing trailing data must be rejected, the option
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
compressed file (bugs in the system libraries, memory errors, etc).
Therefore, if the data you are going to compress are important, give the
@samp{--keep} option to plzip and don't remove the original file until
you verify the compressed file with a command like
@w{@samp{plzip -cd file.lz | cmp file -}}.
@samp{--keep} option to plzip and don't remove the original file until you
verify the compressed file with a command like
@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during
compression can only be detected by comparing the compressed file with the
original because the corruption happens before plzip compresses the RAM
contents, resulting in a valid compressed file containing wrong data.
@sp 1
@noindent