Merging upstream version 1.8.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
95e76700ee
commit
3ab3342c4f
21 changed files with 729 additions and 460 deletions
134
doc/plzip.texi
134
doc/plzip.texi
|
@ -6,19 +6,19 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 7 February 2018
|
||||
@set VERSION 1.7
|
||||
@set UPDATED 5 January 2019
|
||||
@set VERSION 1.8
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
* Plzip: (plzip). Parallel compressor compatible with lzip
|
||||
* Plzip: (plzip). Massively parallel implementation of lzip
|
||||
@end direntry
|
||||
|
||||
|
||||
@ifnothtml
|
||||
@titlepage
|
||||
@title Plzip
|
||||
@subtitle Parallel compressor compatible with lzip
|
||||
@subtitle Massively parallel implementation of lzip
|
||||
@subtitle for Plzip version @value{VERSION}, @value{UPDATED}
|
||||
@author by Antonio Diaz Diaz
|
||||
|
||||
|
@ -49,7 +49,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2009-2018 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2009-2019 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission
|
||||
to copy, distribute and modify it.
|
||||
|
@ -59,23 +59,28 @@ to copy, distribute and modify it.
|
|||
@chapter Introduction
|
||||
@cindex introduction
|
||||
|
||||
Plzip is a massively parallel (multi-threaded) lossless data compressor
|
||||
based on the lzlib compression library, with a user interface similar to
|
||||
the one of lzip, bzip2 or gzip.
|
||||
@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip} is a massively parallel
|
||||
(multi-threaded) implementation of lzip, fully compatible with lzip 1.4 or
|
||||
newer. Plzip uses the lzlib compression library.
|
||||
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} is a lossless data
|
||||
compressor with a user interface similar to the one of gzip or bzip2. Lzip
|
||||
can compress about as fast as gzip @w{(lzip -0)} or compress most files more
|
||||
than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between gzip
|
||||
and bzip2. Lzip is better than gzip and bzip2 from a data recovery
|
||||
perspective. Lzip has been designed, written and tested with great care to
|
||||
replace gzip and bzip2 as the standard general-purpose compressed format for
|
||||
unix-like systems.
|
||||
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip (@pxref{Minimum file sizes}).
|
||||
plzip is no faster than lzip. @xref{Minimum file sizes}.
|
||||
|
||||
Plzip uses the lzip file format; the files produced by plzip are fully
|
||||
compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
|
||||
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
The lzip file format is designed for data sharing and long-term archiving,
|
||||
taking into account both data integrity and decoder availability:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -107,15 +112,14 @@ repair the nearer it is from the beginning of the file. Therefore, with
|
|||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
|
||||
Plzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
Plzip uses the same well-defined exit status values used by lzip, which
|
||||
makes it safer than compressors returning ambiguous warning values (like
|
||||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Plzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit (@pxref{Memory requirements}).
|
||||
Plzip will automatically use for each file the largest dictionary size that
|
||||
does not exceed neither the file size nor the limit given. Keep in mind that
|
||||
the decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit. @xref{Memory requirements}.
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
|
@ -130,7 +134,7 @@ file from that of the compressed file as follows:
|
|||
|
||||
(De)compressing a file is much like copying or moving it; therefore plzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as "cp -p" does. (If the user ID or
|
||||
possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
|
||||
|
@ -142,10 +146,10 @@ standard input to standard output. In this case, plzip will decline to
|
|||
write compressed output to a terminal, as this would be entirely
|
||||
incomprehensible and therefore pointless.
|
||||
|
||||
Plzip will correctly decompress a file which is the concatenation of two
|
||||
or more compressed files. The result is the concatenation of the
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
Plzip will correctly decompress a file which is the concatenation of two or
|
||||
more compressed files. The result is the concatenation of the corresponding
|
||||
decompressed files. Integrity testing of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
|
||||
@node Output
|
||||
|
@ -225,6 +229,7 @@ Print an informative help message describing the options and exit.
|
|||
@item -V
|
||||
@itemx --version
|
||||
Print the version number of plzip on the standard output and exit.
|
||||
This version number should be included in all bug reports.
|
||||
|
||||
@anchor{--trailing-error}
|
||||
@item -a
|
||||
|
@ -322,12 +327,13 @@ Quiet operation. Suppress all messages.
|
|||
@item -s @var{bytes}
|
||||
@itemx --dictionary-size=@var{bytes}
|
||||
When compressing, set the dictionary size limit in bytes. Plzip will use
|
||||
the smallest possible dictionary size for each file without exceeding
|
||||
this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
|
||||
to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
|
||||
that dictionary sizes are quantized. If the specified size does not
|
||||
match one of the valid sizes, it will be rounded upwards by adding up to
|
||||
@w{(@var{bytes} / 8)} to it.
|
||||
for each file the largest dictionary size that does not exceed neither
|
||||
the file size nor this limit. Valid values range from @w{4 KiB} to
|
||||
@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
|
||||
coded in just one byte (@pxref{coded-dict-size}). If the specified size
|
||||
does not match one of the valid sizes, it will be rounded upwards by
|
||||
adding up to @w{(@var{bytes} / 8)} to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as large
|
||||
as possible, but keep in mind that the decompression memory requirement
|
||||
|
@ -354,18 +360,23 @@ Two or more @samp{-v} options show the progress of (de)compression,
|
|||
except for single-member files.
|
||||
|
||||
@item -0 .. -9
|
||||
Set the compression parameters (dictionary size and match length limit)
|
||||
as shown in the table below. The default compression level is @samp{-6}.
|
||||
Note that @samp{-9} can be much slower than @samp{-0}. These options
|
||||
have no effect when decompressing, testing or listing.
|
||||
Compression level. Set the compression parameters (dictionary size and
|
||||
match length limit) as shown in the table below. The default compression
|
||||
level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that
|
||||
@samp{-9} can be much slower than @samp{-0}. These options have no
|
||||
effect when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the @samp{--dictionary-size} and
|
||||
@samp{--match-length} options directly to achieve optimal performance.
|
||||
|
||||
@multitable {Level} {Dictionary size} {Match length limit}
|
||||
@item Level @tab Dictionary size @tab Match length limit
|
||||
If several compression levels or @samp{-s} or @samp{-m} options are
|
||||
given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is
|
||||
equivalent to @w{@samp{-s64MiB -m273}}
|
||||
|
||||
@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)}
|
||||
@item Level @tab Dictionary size (-s) @tab Match length limit (-m)
|
||||
@item -0 @tab 64 KiB @tab 16 bytes
|
||||
@item -1 @tab 1 MiB @tab 5 bytes
|
||||
@item -2 @tab 1.5 MiB @tab 6 bytes
|
||||
|
@ -388,6 +399,18 @@ bytes are so similar to the magic bytes of a lzip header that they can
|
|||
be confused with a corrupt header. Use this option if a file triggers a
|
||||
"corrupt header" error and the cause is not indeed a corrupt header.
|
||||
|
||||
@item --in-slots=@var{n}
|
||||
Number of @w{1 MiB} input packets buffered per worker thread when
|
||||
decompressing from non-seekable input. Increasing the number of packets
|
||||
may increase decompression speed, but requires more memory. Valid values
|
||||
range from 1 to 64. The default value is 4.
|
||||
|
||||
@item --out-slots=@var{n}
|
||||
Number of @w{1 MiB} output packets buffered per worker thread when
|
||||
decompressing to non-seekable output. Increasing the number of packets
|
||||
may increase decompression speed, but requires more memory. Valid values
|
||||
range from 1 to 1024. The default value is 64.
|
||||
|
||||
@end table
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
|
@ -506,12 +529,13 @@ A four byte string, identifying the lzip format, with the value "LZIP"
|
|||
@item VN (version number, 1 byte)
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
@anchor{coded-dict-size}
|
||||
@item DS (coded dictionary size, 1 byte)
|
||||
The dictionary size is calculated by taking a power of 2 (the base size)
|
||||
and substracting from it a fraction between 0/16 and 7/16 of the base
|
||||
and subtracting from it a fraction between 0/16 and 7/16 of the base
|
||||
size.@*
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to substract
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.@*
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
@ -546,8 +570,8 @@ facilitates safe recovery of undamaged members from multimember files.
|
|||
@chapter Memory required to compress and decompress
|
||||
@cindex memory requirements
|
||||
|
||||
The amount of memory required @strong{per thread} for decompression or
|
||||
testing is approximately the following:
|
||||
The amount of memory required @strong{per worker thread} for
|
||||
decompression or testing is approximately the following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -556,20 +580,23 @@ or for testing of a regular file; the dictionary size.
|
|||
|
||||
@item
|
||||
For testing of a non-seekable file or of standard input; the dictionary
|
||||
size plus up to @w{5 MiB}.
|
||||
size plus @w{1 MiB} plus up to the number of @w{1 MiB} input packets
|
||||
buffered (4 by default).
|
||||
|
||||
@item
|
||||
For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to @w{32 MiB}.
|
||||
standard output; the dictionary size plus up to the number of @w{1 MiB}
|
||||
output packets buffered (64 by default).
|
||||
|
||||
@item
|
||||
For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to @w{35 MiB}.
|
||||
dictionary size plus @w{1 MiB} plus up to the number of @w{1 MiB} input
|
||||
and output packets buffered (68 by default).
|
||||
@end itemize
|
||||
|
||||
@noindent
|
||||
The amount of memory required @strong{per thread} for compression is
|
||||
approximately the following:
|
||||
The amount of memory required @strong{per worker thread} for compression
|
||||
is approximately the following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -696,9 +723,12 @@ where a file containing trailing data must be rejected, the option
|
|||
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
|
||||
compressed file (bugs in the system libraries, memory errors, etc).
|
||||
Therefore, if the data you are going to compress are important, give the
|
||||
@samp{--keep} option to plzip and don't remove the original file until
|
||||
you verify the compressed file with a command like
|
||||
@w{@samp{plzip -cd file.lz | cmp file -}}.
|
||||
@samp{--keep} option to plzip and don't remove the original file until you
|
||||
verify the compressed file with a command like
|
||||
@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during
|
||||
compression can only be detected by comparing the compressed file with the
|
||||
original because the corruption happens before plzip compresses the RAM
|
||||
contents, resulting in a valid compressed file containing wrong data.
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue