Merging upstream version 1.3~pre1.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
f04d94e9dd
commit
e4e17ab53e
17 changed files with 387 additions and 259 deletions
|
@ -1,5 +1,5 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH PLZIP "1" "August 2014" "plzip 1.2" "User Commands"
|
||||
.TH PLZIP "1" "November 2014" "plzip 1.3-pre1" "User Commands"
|
||||
.SH NAME
|
||||
plzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
|
@ -70,8 +70,7 @@ Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...
|
|||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the \fB\-\-match\-length\fR and \fB\-\-dictionary\-size\fR
|
||||
options directly to achieve optimal performance. For example, \fB\-9m64\fR
|
||||
usually compresses executables more (and faster) than \fB\-9\fR.
|
||||
options directly to achieve optimal performance.
|
||||
.PP
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
|
|
145
doc/plzip.info
145
doc/plzip.info
|
@ -11,7 +11,7 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Plzip Manual
|
||||
************
|
||||
|
||||
This manual is for Plzip (version 1.2, 29 August 2014).
|
||||
This manual is for Plzip (version 1.3-pre1, 25 November 2014).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -19,6 +19,8 @@ This manual is for Plzip (version 1.2, 29 August 2014).
|
|||
* Program design:: Internal structure of plzip
|
||||
* Invoking plzip:: Command line interface
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Memory requirements:: Memory required to compress and decompress
|
||||
* Minimum file sizes:: Minimum file sizes required for full speed
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
||||
|
@ -40,16 +42,18 @@ the one of lzip, bzip2 or gzip.
|
|||
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio. Note that the number of usable threads is limited by file size;
|
||||
on files larger than a few GB plzip can use hundreds of processors, but
|
||||
on files of only a few MB plzip is no faster than lzip.
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip (*note Minimum file sizes::).
|
||||
|
||||
Plzip uses the lzip file format; the files produced by plzip are
|
||||
fully compatible with lzip-1.4 or newer, and can be rescued with
|
||||
lziprecover.
|
||||
|
||||
The lzip file format is designed for long-term data archiving, taking
|
||||
into account both data integrity and decoder availability:
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors
|
||||
|
@ -64,50 +68,23 @@ into account both data integrity and decoder availability:
|
|||
archaeologist to extract the data from a lzip file long after
|
||||
quantum computers eventually render LZMA obsolete.
|
||||
|
||||
* Additionally lzip is copylefted, which guarantees that it will
|
||||
remain free forever.
|
||||
* Additionally the lzip reference implementation is copylefted, which
|
||||
guarantees that it will remain free forever.
|
||||
|
||||
A nice feature of the lzip format is that a corrupt byte is easier to
|
||||
repair the nearer it is from the beginning of the file. Therefore, with
|
||||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
|
||||
The member trailer stores the 32-bit CRC of the original data, the
|
||||
size of the original data and the size of the member. These values,
|
||||
together with the value remaining in the range decoder and the
|
||||
end-of-stream marker, provide a 4 factor integrity checking which
|
||||
guarantees that the decompressed version of the data is identical to
|
||||
the original. This guards against corruption of the compressed data,
|
||||
and against undetected bugs in plzip (hopefully very unlikely). The
|
||||
chances of data corruption going undetected are microscopic. Be aware,
|
||||
though, that the check occurs upon decompression, so it can only tell
|
||||
you that something is wrong. It can't help you recover the original
|
||||
uncompressed data.
|
||||
|
||||
Plzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
|
||||
The amount of memory required *per thread* is approximately the
|
||||
following:
|
||||
|
||||
* For compression; 3 times the data size (*note --data-size::) plus
|
||||
11 times the dictionary size.
|
||||
|
||||
* For decompression or testing of a non-seekable file or of standard
|
||||
input; 2 times the dictionary size plus up to 32 MiB.
|
||||
|
||||
* For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
|
||||
* For decompression of a regular file to another regular file, or for
|
||||
testing of a regular file; the dictionary size.
|
||||
|
||||
Plzip will automatically use the smallest possible dictionary size
|
||||
for each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
choice of dictionary size limit (*note Memory requirements::).
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
|
@ -245,8 +222,8 @@ The format for running plzip is:
|
|||
value.
|
||||
|
||||
Note that the number of usable threads is limited to
|
||||
ceil( file_size / data_size ) during compression (*note
|
||||
--data-size::), and to the number of members in the input during
|
||||
ceil( file_size / data_size ) during compression (*note Minimum
|
||||
file sizes::), and to the number of members in the input during
|
||||
decompression.
|
||||
|
||||
'-o FILE'
|
||||
|
@ -287,8 +264,8 @@ The format for running plzip is:
|
|||
When compressing, show the compression ratio for each file
|
||||
processed. A second '-v' shows the progress of compression.
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, decompressed
|
||||
size, and compressed size.
|
||||
verbosity level, showing status, compression ratio, dictionary
|
||||
size, decompressed size, and compressed size.
|
||||
|
||||
'-1 .. -9'
|
||||
Set the compression parameters (dictionary size and match length
|
||||
|
@ -299,8 +276,7 @@ The format for running plzip is:
|
|||
linear scale optimal for all files. If your files are large, very
|
||||
repetitive, etc, you may need to use the '--match-length' and
|
||||
'--dictionary-size' options directly to achieve optimal
|
||||
performance. For example, '-9m64' usually compresses executables
|
||||
more (and faster) than '-9'.
|
||||
performance.
|
||||
|
||||
Level Dictionary size Match length limit
|
||||
-1 1 MiB 5 bytes
|
||||
|
@ -340,7 +316,7 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
|||
caused plzip to panic.
|
||||
|
||||
|
||||
File: plzip.info, Node: File format, Next: Problems, Prev: Invoking plzip, Up: Top
|
||||
File: plzip.info, Node: File format, Next: Memory requirements, Prev: Invoking plzip, Up: Top
|
||||
|
||||
4 File format
|
||||
*************
|
||||
|
@ -413,9 +389,70 @@ additional information before, between, or after them.
|
|||
|
||||
|
||||
|
||||
File: plzip.info, Node: Problems, Next: Concept index, Prev: File format, Up: Top
|
||||
File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: File format, Up: Top
|
||||
|
||||
5 Reporting bugs
|
||||
5 Memory required to compress and decompress
|
||||
********************************************
|
||||
|
||||
The amount of memory required *per thread* is approximately the
|
||||
following:
|
||||
|
||||
* For compression; 11 times the dictionary size plus 3 times the
|
||||
data size (*note --data-size::).
|
||||
|
||||
* For decompression of a regular (seekable) file to another regular
|
||||
file, or for testing of a regular file; the dictionary size. Note
|
||||
that regular files with more than 1024 bytes of trailing garbage
|
||||
are treated as non-seekable.
|
||||
|
||||
* For testing of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to 5 MiB.
|
||||
|
||||
* For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
|
||||
* For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to 35 MiB.
|
||||
|
||||
|
||||
File: plzip.info, Node: Minimum file sizes, Next: Problems, Prev: Memory requirements, Up: Top
|
||||
|
||||
6 Minimum file sizes required for full compression speed
|
||||
********************************************************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
compresses as many chunks simultaneously as worker threads are chosen,
|
||||
creating a multi-member compressed file.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression
|
||||
speed by the number of available processors), the uncompressed file
|
||||
must be at least as large as the number of worker threads times the
|
||||
chunk size (*note --data-size::). Else some processors will not get any
|
||||
data to compress, and compression will be proportionally slower. The
|
||||
maximum speed increase achievable on a given file is limited by the
|
||||
ratio (file_size / data_size).
|
||||
|
||||
The following table shows the minimum uncompressed file size needed
|
||||
for full use of N processors at a given compression level, using the
|
||||
default data size for each level:
|
||||
|
||||
Processors 2 3 4 8 16 64
|
||||
-------------------------------------------------------------------------
|
||||
Level
|
||||
-1 4 MiB 6 MiB 8 MiB 16 MiB 32 MiB 128 MiB
|
||||
-2 6 MiB 9 MiB 12 MiB 24 MiB 48 MiB 192 MiB
|
||||
-3 8 MiB 12 MiB 16 MiB 32 MiB 64 MiB 256 MiB
|
||||
-4 12 MiB 18 MiB 24 MiB 48 MiB 96 MiB 384 MiB
|
||||
-5 16 MiB 24 MiB 32 MiB 64 MiB 128 MiB 512 MiB
|
||||
-6 32 MiB 48 MiB 64 MiB 128 MiB 256 MiB 1 GiB
|
||||
-7 64 MiB 96 MiB 128 MiB 256 MiB 512 MiB 2 GiB
|
||||
-8 96 MiB 144 MiB 192 MiB 384 MiB 768 MiB 3 GiB
|
||||
-9 128 MiB 192 MiB 256 MiB 512 MiB 1 GiB 4 GiB
|
||||
|
||||
|
||||
File: plzip.info, Node: Problems, Next: Concept index, Prev: Minimum file sizes, Up: Top
|
||||
|
||||
7 Reporting bugs
|
||||
****************
|
||||
|
||||
There are probably bugs in plzip. There are certainly errors and
|
||||
|
@ -441,6 +478,8 @@ Concept index
|
|||
* getting help: Problems. (line 6)
|
||||
* introduction: Introduction. (line 6)
|
||||
* invoking: Invoking plzip. (line 6)
|
||||
* memory requirements: Memory requirements. (line 6)
|
||||
* minimum file sizes: Minimum file sizes. (line 6)
|
||||
* options: Invoking plzip. (line 6)
|
||||
* program design: Program design. (line 6)
|
||||
* usage: Invoking plzip. (line 6)
|
||||
|
@ -450,13 +489,15 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top221
|
||||
Node: Introduction847
|
||||
Node: Program design6279
|
||||
Node: Invoking plzip7868
|
||||
Ref: --data-size8313
|
||||
Node: File format13471
|
||||
Node: Problems15976
|
||||
Node: Concept index16505
|
||||
Node: Introduction994
|
||||
Node: Program design5290
|
||||
Node: Invoking plzip6879
|
||||
Ref: --data-size7324
|
||||
Node: File format12420
|
||||
Node: Memory requirements14936
|
||||
Node: Minimum file sizes15913
|
||||
Node: Problems17765
|
||||
Node: Concept index18301
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
134
doc/plzip.texi
134
doc/plzip.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 29 August 2014
|
||||
@set VERSION 1.2
|
||||
@set UPDATED 25 November 2014
|
||||
@set VERSION 1.3-pre1
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -39,6 +39,8 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
|
|||
* Program design:: Internal structure of plzip
|
||||
* Invoking plzip:: Command line interface
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Memory requirements:: Memory required to compress and decompress
|
||||
* Minimum file sizes:: Minimum file sizes required for full speed
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
@end menu
|
||||
|
@ -60,15 +62,17 @@ the one of lzip, bzip2 or gzip.
|
|||
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio. Note that the number of usable threads is limited by file size;
|
||||
on files larger than a few GB plzip can use hundreds of processors, but
|
||||
on files of only a few MB plzip is no faster than lzip.
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip (@pxref{Minimum file sizes}).
|
||||
|
||||
Plzip uses the lzip file format; the files produced by plzip are fully
|
||||
compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
|
||||
|
||||
The lzip file format is designed for long-term data archiving, taking
|
||||
into account both data integrity and decoder availability:
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -87,8 +91,8 @@ data from a lzip file long after quantum computers eventually render
|
|||
LZMA obsolete.
|
||||
|
||||
@item
|
||||
Additionally lzip is copylefted, which guarantees that it will remain
|
||||
free forever.
|
||||
Additionally the lzip reference implementation is copylefted, which
|
||||
guarantees that it will remain free forever.
|
||||
@end itemize
|
||||
|
||||
A nice feature of the lzip format is that a corrupt byte is easier to
|
||||
|
@ -96,47 +100,15 @@ repair the nearer it is from the beginning of the file. Therefore, with
|
|||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
|
||||
The member trailer stores the 32-bit CRC of the original data, the size
|
||||
of the original data and the size of the member. These values, together
|
||||
with the value remaining in the range decoder and the end-of-stream
|
||||
marker, provide a 4 factor integrity checking which guarantees that the
|
||||
decompressed version of the data is identical to the original. This
|
||||
guards against corruption of the compressed data, and against undetected
|
||||
bugs in plzip (hopefully very unlikely). The chances of data corruption
|
||||
going undetected are microscopic. Be aware, though, that the check
|
||||
occurs upon decompression, so it can only tell you that something is
|
||||
wrong. It can't help you recover the original uncompressed data.
|
||||
|
||||
Plzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
|
||||
The amount of memory required @strong{per thread} is approximately the
|
||||
following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
For compression; 3 times the data size (@pxref{--data-size}) plus 11
|
||||
times the dictionary size.
|
||||
|
||||
@item
|
||||
For decompression or testing of a non-seekable file or of standard
|
||||
input; 2 times the dictionary size plus up to 32 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to another regular file, or for
|
||||
testing of a regular file; the dictionary size.
|
||||
@end itemize
|
||||
|
||||
Plzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
choice of dictionary size limit (@pxref{Memory requirements}).
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
|
@ -277,8 +249,8 @@ detect the number of processors in the system and use it as default
|
|||
value. @w{@samp{plzip --help}} shows the system's default value.
|
||||
|
||||
Note that the number of usable threads is limited to @w{ceil( file_size
|
||||
/ data_size )} during compression (@pxref{--data-size}), and to the
|
||||
number of members in the input during decompression.
|
||||
/ data_size )} during compression (@pxref{Minimum file sizes}), and to
|
||||
the number of members in the input during decompression.
|
||||
|
||||
@item -o @var{file}
|
||||
@itemx --output=@var{file}
|
||||
|
@ -315,8 +287,8 @@ Verbose mode.@*
|
|||
When compressing, show the compression ratio for each file processed. A
|
||||
second @samp{-v} shows the progress of compression.@*
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, decompressed size,
|
||||
and compressed size.
|
||||
verbosity level, showing status, compression ratio, dictionary size,
|
||||
decompressed size, and compressed size.
|
||||
|
||||
@item -1 .. -9
|
||||
Set the compression parameters (dictionary size and match length limit)
|
||||
|
@ -327,8 +299,7 @@ The bidimensional parameter space of LZMA can't be mapped to a linear
|
|||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the @samp{--match-length} and
|
||||
@samp{--dictionary-size} options directly to achieve optimal
|
||||
performance. For example, @samp{-9m64} usually compresses executables
|
||||
more (and faster) than @samp{-9}.
|
||||
performance.
|
||||
|
||||
@multitable {Level} {Dictionary size} {Match length limit}
|
||||
@item Level @tab Dictionary size @tab Match length limit
|
||||
|
@ -449,6 +420,73 @@ facilitates safe recovery of undamaged members from multi-member files.
|
|||
@end table
|
||||
|
||||
|
||||
@node Memory requirements
|
||||
@chapter Memory required to compress and decompress
|
||||
@cindex memory requirements
|
||||
|
||||
The amount of memory required @strong{per thread} is approximately the
|
||||
following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
For compression; 11 times the dictionary size plus 3 times the data size
|
||||
(@pxref{--data-size}).
|
||||
|
||||
@item
|
||||
For decompression of a regular (seekable) file to another regular file,
|
||||
or for testing of a regular file; the dictionary size. Note that regular
|
||||
files with more than 1024 bytes of trailing garbage are treated as
|
||||
non-seekable.
|
||||
|
||||
@item
|
||||
For testing of a non-seekable file or of standard input; the dictionary
|
||||
size plus up to 5 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to 35 MiB.
|
||||
@end itemize
|
||||
|
||||
|
||||
@node Minimum file sizes
|
||||
@chapter Minimum file sizes required for full compression speed
|
||||
@cindex minimum file sizes
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
compresses as many chunks simultaneously as worker threads are chosen,
|
||||
creating a multi-member compressed file.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed
|
||||
by the number of available processors), the uncompressed file must be at
|
||||
least as large as the number of worker threads times the chunk size
|
||||
(@pxref{--data-size}). Else some processors will not get any data to
|
||||
compress, and compression will be proportionally slower. The maximum
|
||||
speed increase achievable on a given file is limited by the ratio
|
||||
@w{(file_size / data_size)}.
|
||||
|
||||
The following table shows the minimum uncompressed file size needed for
|
||||
full use of N processors at a given compression level, using the default
|
||||
data size for each level:
|
||||
|
||||
@multitable {Processors} {128 MiB} {128 MiB} {128 MiB} {128 MiB} {128 MiB} {128 MiB}
|
||||
@headitem Processors @tab 2 @tab 3 @tab 4 @tab 8 @tab 16 @tab 64
|
||||
@item Level
|
||||
@item -1 @tab 4 MiB @tab 6 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB
|
||||
@item -2 @tab 6 MiB @tab 9 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB
|
||||
@item -3 @tab 8 MiB @tab 12 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB
|
||||
@item -4 @tab 12 MiB @tab 18 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB
|
||||
@item -5 @tab 16 MiB @tab 24 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB
|
||||
@item -6 @tab 32 MiB @tab 48 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB
|
||||
@item -7 @tab 64 MiB @tab 96 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB
|
||||
@item -8 @tab 96 MiB @tab 144 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB
|
||||
@item -9 @tab 128 MiB @tab 192 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB
|
||||
@end multitable
|
||||
|
||||
|
||||
@node Problems
|
||||
@chapter Reporting bugs
|
||||
@cindex bugs
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue