Merging upstream version 1.3~pre1.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
f04d94e9dd
commit
e4e17ab53e
17 changed files with 387 additions and 259 deletions
134
doc/plzip.texi
134
doc/plzip.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 29 August 2014
|
||||
@set VERSION 1.2
|
||||
@set UPDATED 25 November 2014
|
||||
@set VERSION 1.3-pre1
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -39,6 +39,8 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
|
|||
* Program design:: Internal structure of plzip
|
||||
* Invoking plzip:: Command line interface
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Memory requirements:: Memory required to compress and decompress
|
||||
* Minimum file sizes:: Minimum file sizes required for full speed
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
@end menu
|
||||
|
@ -60,15 +62,17 @@ the one of lzip, bzip2 or gzip.
|
|||
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio. Note that the number of usable threads is limited by file size;
|
||||
on files larger than a few GB plzip can use hundreds of processors, but
|
||||
on files of only a few MB plzip is no faster than lzip.
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip (@pxref{Minimum file sizes}).
|
||||
|
||||
Plzip uses the lzip file format; the files produced by plzip are fully
|
||||
compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
|
||||
|
||||
The lzip file format is designed for long-term data archiving, taking
|
||||
into account both data integrity and decoder availability:
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -87,8 +91,8 @@ data from a lzip file long after quantum computers eventually render
|
|||
LZMA obsolete.
|
||||
|
||||
@item
|
||||
Additionally lzip is copylefted, which guarantees that it will remain
|
||||
free forever.
|
||||
Additionally the lzip reference implementation is copylefted, which
|
||||
guarantees that it will remain free forever.
|
||||
@end itemize
|
||||
|
||||
A nice feature of the lzip format is that a corrupt byte is easier to
|
||||
|
@ -96,47 +100,15 @@ repair the nearer it is from the beginning of the file. Therefore, with
|
|||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
|
||||
The member trailer stores the 32-bit CRC of the original data, the size
|
||||
of the original data and the size of the member. These values, together
|
||||
with the value remaining in the range decoder and the end-of-stream
|
||||
marker, provide a 4 factor integrity checking which guarantees that the
|
||||
decompressed version of the data is identical to the original. This
|
||||
guards against corruption of the compressed data, and against undetected
|
||||
bugs in plzip (hopefully very unlikely). The chances of data corruption
|
||||
going undetected are microscopic. Be aware, though, that the check
|
||||
occurs upon decompression, so it can only tell you that something is
|
||||
wrong. It can't help you recover the original uncompressed data.
|
||||
|
||||
Plzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
|
||||
The amount of memory required @strong{per thread} is approximately the
|
||||
following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
For compression; 3 times the data size (@pxref{--data-size}) plus 11
|
||||
times the dictionary size.
|
||||
|
||||
@item
|
||||
For decompression or testing of a non-seekable file or of standard
|
||||
input; 2 times the dictionary size plus up to 32 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to another regular file, or for
|
||||
testing of a regular file; the dictionary size.
|
||||
@end itemize
|
||||
|
||||
Plzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
choice of dictionary size limit (@pxref{Memory requirements}).
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
|
@ -277,8 +249,8 @@ detect the number of processors in the system and use it as default
|
|||
value. @w{@samp{plzip --help}} shows the system's default value.
|
||||
|
||||
Note that the number of usable threads is limited to @w{ceil( file_size
|
||||
/ data_size )} during compression (@pxref{--data-size}), and to the
|
||||
number of members in the input during decompression.
|
||||
/ data_size )} during compression (@pxref{Minimum file sizes}), and to
|
||||
the number of members in the input during decompression.
|
||||
|
||||
@item -o @var{file}
|
||||
@itemx --output=@var{file}
|
||||
|
@ -315,8 +287,8 @@ Verbose mode.@*
|
|||
When compressing, show the compression ratio for each file processed. A
|
||||
second @samp{-v} shows the progress of compression.@*
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, decompressed size,
|
||||
and compressed size.
|
||||
verbosity level, showing status, compression ratio, dictionary size,
|
||||
decompressed size, and compressed size.
|
||||
|
||||
@item -1 .. -9
|
||||
Set the compression parameters (dictionary size and match length limit)
|
||||
|
@ -327,8 +299,7 @@ The bidimensional parameter space of LZMA can't be mapped to a linear
|
|||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the @samp{--match-length} and
|
||||
@samp{--dictionary-size} options directly to achieve optimal
|
||||
performance. For example, @samp{-9m64} usually compresses executables
|
||||
more (and faster) than @samp{-9}.
|
||||
performance.
|
||||
|
||||
@multitable {Level} {Dictionary size} {Match length limit}
|
||||
@item Level @tab Dictionary size @tab Match length limit
|
||||
|
@ -449,6 +420,73 @@ facilitates safe recovery of undamaged members from multi-member files.
|
|||
@end table
|
||||
|
||||
|
||||
@node Memory requirements
|
||||
@chapter Memory required to compress and decompress
|
||||
@cindex memory requirements
|
||||
|
||||
The amount of memory required @strong{per thread} is approximately the
|
||||
following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
For compression; 11 times the dictionary size plus 3 times the data size
|
||||
(@pxref{--data-size}).
|
||||
|
||||
@item
|
||||
For decompression of a regular (seekable) file to another regular file,
|
||||
or for testing of a regular file; the dictionary size. Note that regular
|
||||
files with more than 1024 bytes of trailing garbage are treated as
|
||||
non-seekable.
|
||||
|
||||
@item
|
||||
For testing of a non-seekable file or of standard input; the dictionary
|
||||
size plus up to 5 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to 35 MiB.
|
||||
@end itemize
|
||||
|
||||
|
||||
@node Minimum file sizes
|
||||
@chapter Minimum file sizes required for full compression speed
|
||||
@cindex minimum file sizes
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
compresses as many chunks simultaneously as worker threads are chosen,
|
||||
creating a multi-member compressed file.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed
|
||||
by the number of available processors), the uncompressed file must be at
|
||||
least as large as the number of worker threads times the chunk size
|
||||
(@pxref{--data-size}). Else some processors will not get any data to
|
||||
compress, and compression will be proportionally slower. The maximum
|
||||
speed increase achievable on a given file is limited by the ratio
|
||||
@w{(file_size / data_size)}.
|
||||
|
||||
The following table shows the minimum uncompressed file size needed for
|
||||
full use of N processors at a given compression level, using the default
|
||||
data size for each level:
|
||||
|
||||
@multitable {Processors} {128 MiB} {128 MiB} {128 MiB} {128 MiB} {128 MiB} {128 MiB}
|
||||
@headitem Processors @tab 2 @tab 3 @tab 4 @tab 8 @tab 16 @tab 64
|
||||
@item Level
|
||||
@item -1 @tab 4 MiB @tab 6 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB
|
||||
@item -2 @tab 6 MiB @tab 9 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB
|
||||
@item -3 @tab 8 MiB @tab 12 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB
|
||||
@item -4 @tab 12 MiB @tab 18 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB
|
||||
@item -5 @tab 16 MiB @tab 24 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB
|
||||
@item -6 @tab 32 MiB @tab 48 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB
|
||||
@item -7 @tab 64 MiB @tab 96 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB
|
||||
@item -8 @tab 96 MiB @tab 144 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB
|
||||
@item -9 @tab 128 MiB @tab 192 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB
|
||||
@end multitable
|
||||
|
||||
|
||||
@node Problems
|
||||
@chapter Reporting bugs
|
||||
@cindex bugs
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue