Adding upstream version 1.7.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
abd87c79b1
commit
96fff67cb2
20 changed files with 841 additions and 444 deletions
|
@ -1,5 +1,5 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH PLZIP "1" "April 2017" "plzip 1.6" "User Commands"
|
||||
.TH PLZIP "1" "February 2018" "plzip 1.7" "User Commands"
|
||||
.SH NAME
|
||||
plzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
|
@ -68,6 +68,9 @@ alias for \fB\-0\fR
|
|||
.TP
|
||||
\fB\-\-best\fR
|
||||
alias for \fB\-9\fR
|
||||
.TP
|
||||
\fB\-\-loose\-trailing\fR
|
||||
allow trailing data seeming corrupt header
|
||||
.PP
|
||||
If no file names are given, or if a file is '\-', plzip compresses or
|
||||
decompresses from standard input to standard output.
|
||||
|
@ -92,8 +95,8 @@ Plzip home page: http://www.nongnu.org/lzip/plzip.html
|
|||
.SH COPYRIGHT
|
||||
Copyright \(co 2009 Laszlo Ersek.
|
||||
.br
|
||||
Copyright \(co 2017 Antonio Diaz Diaz.
|
||||
Using lzlib 1.9
|
||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||
Using lzlib 1.10
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
268
doc/plzip.info
268
doc/plzip.info
|
@ -11,11 +11,12 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Plzip Manual
|
||||
************
|
||||
|
||||
This manual is for Plzip (version 1.6, 12 April 2017).
|
||||
This manual is for Plzip (version 1.7, 7 February 2018).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of plzip
|
||||
* Output:: Meaning of plzip's output
|
||||
* Invoking plzip:: Command line interface
|
||||
* Program design:: Internal structure of plzip
|
||||
* File format:: Detailed format of the compressed file
|
||||
|
@ -27,13 +28,13 @@ This manual is for Plzip (version 1.6, 12 April 2017).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2009-2017 Antonio Diaz Diaz.
|
||||
Copyright (C) 2009-2018 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to
|
||||
copy, distribute and modify it.
|
||||
|
||||
|
||||
File: plzip.info, Node: Introduction, Next: Invoking plzip, Prev: Top, Up: Top
|
||||
File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
||||
|
||||
1 Introduction
|
||||
**************
|
||||
|
@ -58,7 +59,7 @@ archiving, taking into account both data integrity and decoder
|
|||
availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors
|
||||
recovery means. The lziprecover program can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file. *Note Data safety:
|
||||
|
@ -114,17 +115,60 @@ entirely incomprehensible and therefore pointless.
|
|||
|
||||
Plzip will correctly decompress a file which is the concatenation of
|
||||
two or more compressed files. The result is the concatenation of the
|
||||
corresponding uncompressed files. Integrity testing of concatenated
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
|
||||
|
||||
File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top
|
||||
|
||||
2 Meaning of plzip's output
|
||||
***************************
|
||||
|
||||
The output of plzip looks like this:
|
||||
|
||||
plzip -v foo
|
||||
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
|
||||
|
||||
plzip -tvv foo.lz
|
||||
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
'N:1'
|
||||
The compression ratio (uncompressed_size / compressed_size), shown
|
||||
as N to 1.
|
||||
|
||||
'ratio'
|
||||
The inverse compression ratio
|
||||
(compressed_size / uncompressed_size), shown as a percentage. A
|
||||
decimal ratio is easily obtained by moving the decimal point two
|
||||
places to the left; 14.98% = 0.1498.
|
||||
|
||||
'saved'
|
||||
The space saved by compression (1 - ratio), shown as a percentage.
|
||||
|
||||
'in'
|
||||
The size of the uncompressed data. When decompressing or testing,
|
||||
it is shown as 'decompressed'. Note that plzip always prints the
|
||||
uncompressed size before the compressed size when compressing,
|
||||
decompressing, testing or listing.
|
||||
|
||||
'out'
|
||||
The size of the compressed data. When decompressing or testing, it
|
||||
is shown as 'compressed'.
|
||||
|
||||
|
||||
When decompressing or testing at verbosity level 4 (-vvvv), the
|
||||
dictionary size used to compress the file is also shown.
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may
|
||||
never have been compressed. Decompressed is used to refer to data which
|
||||
have undergone the process of decompression.
|
||||
|
||||
|
||||
File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Introduction, Up: Top
|
||||
File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top
|
||||
|
||||
2 Invoking plzip
|
||||
3 Invoking plzip
|
||||
****************
|
||||
|
||||
The format for running plzip is:
|
||||
|
@ -135,7 +179,7 @@ The format for running plzip is:
|
|||
other FILES and is read just once, the first time it appears in the
|
||||
command line.
|
||||
|
||||
Plzip supports the following options:
|
||||
plzip supports the following options:
|
||||
|
||||
'-h'
|
||||
'--help'
|
||||
|
@ -154,12 +198,12 @@ command line.
|
|||
|
||||
'-B BYTES'
|
||||
'--data-size=BYTES'
|
||||
Set the size of the input data blocks, in bytes. The input file
|
||||
will be divided in chunks of this size before compression is
|
||||
performed. Valid values range from 8 KiB to 1 GiB. Default value
|
||||
is two times the dictionary size, except for option '-0' where it
|
||||
defaults to 1 MiB. Plzip will reduce the dictionary size if it is
|
||||
larger than the chosen data size.
|
||||
When compressing, set the size of the input data blocks in bytes.
|
||||
The input file will be divided in chunks of this size before
|
||||
compression is performed. Valid values range from 8 KiB to 1 GiB.
|
||||
Default value is two times the dictionary size, except for option
|
||||
'-0' where it defaults to 1 MiB. Plzip will reduce the dictionary
|
||||
size if it is larger than the chosen data size.
|
||||
|
||||
'-c'
|
||||
'--stdout'
|
||||
|
@ -170,10 +214,10 @@ command line.
|
|||
|
||||
'-d'
|
||||
'--decompress'
|
||||
Decompress the specified file(s). If a file does not exist or
|
||||
can't be opened, plzip continues decompressing the rest of the
|
||||
files. If a file fails to decompress, plzip exits immediately
|
||||
without decompressing the rest of the files.
|
||||
Decompress the specified files. If a file does not exist or can't
|
||||
be opened, plzip continues decompressing the rest of the files. If
|
||||
a file fails to decompress, or is a terminal, plzip exits
|
||||
immediately without decompressing the rest of the files.
|
||||
|
||||
'-f'
|
||||
'--force'
|
||||
|
@ -181,8 +225,8 @@ command line.
|
|||
|
||||
'-F'
|
||||
'--recompress'
|
||||
Force re-compression of files whose name already has the '.lz' or
|
||||
'.tlz' suffix.
|
||||
When compressing, force re-compression of files whose name already
|
||||
has the '.lz' or '.tlz' suffix.
|
||||
|
||||
'-k'
|
||||
'--keep'
|
||||
|
@ -192,7 +236,7 @@ command line.
|
|||
'-l'
|
||||
'--list'
|
||||
Print the uncompressed size, compressed size and percentage saved
|
||||
of the specified file(s). Trailing data are ignored. The values
|
||||
of the specified files. Trailing data are ignored. The values
|
||||
produced are correct even for multimember files. If more than one
|
||||
file is given, a final line containing the cumulative sizes is
|
||||
printed. With '-v', the dictionary size, the number of members in
|
||||
|
@ -206,18 +250,21 @@ command line.
|
|||
|
||||
'-m BYTES'
|
||||
'--match-length=BYTES'
|
||||
Set the match length limit in bytes. After a match this long is
|
||||
found, the search is finished. Valid values range from 5 to 273.
|
||||
Larger values usually give better compression ratios but longer
|
||||
compression times.
|
||||
When compressing, set the match length limit in bytes. After a
|
||||
match this long is found, the search is finished. Valid values
|
||||
range from 5 to 273. Larger values usually give better compression
|
||||
ratios but longer compression times.
|
||||
|
||||
'-n N'
|
||||
'--threads=N'
|
||||
Set the number of worker threads. Valid values range from 1 to "as
|
||||
many as your system can support". If this option is not used,
|
||||
plzip tries to detect the number of processors in the system and
|
||||
use it as default value. 'plzip --help' shows the system's default
|
||||
value.
|
||||
Set the number of worker threads, overriding the system's default.
|
||||
Valid values range from 1 to "as many as your system can support".
|
||||
If this option is not used, plzip tries to detect the number of
|
||||
processors in the system and use it as default value. When
|
||||
compressing on a 32 bit system, plzip tries to limit the memory
|
||||
use to under 2.22 GiB (4 worker threads at level -9) by reducing
|
||||
the number of threads below the system's default. 'plzip --help'
|
||||
shows the system's default value.
|
||||
|
||||
Note that the number of usable threads is limited to
|
||||
ceil( file_size / data_size ) during compression (*note Minimum
|
||||
|
@ -228,8 +275,9 @@ command line.
|
|||
'--output=FILE'
|
||||
When reading from standard input and '--stdout' has not been
|
||||
specified, use 'FILE' as the virtual name of the uncompressed
|
||||
file. This produces a file named 'FILE' when decompressing, and a
|
||||
file named 'FILE.lz' when compressing.
|
||||
file. This produces a file named 'FILE' when decompressing, or a
|
||||
file named 'FILE.lz' when compressing. A second '.lz' extension is
|
||||
not added if 'FILE' already ends in '.lz' or '.tlz'.
|
||||
|
||||
'-q'
|
||||
'--quiet'
|
||||
|
@ -237,13 +285,13 @@ command line.
|
|||
|
||||
'-s BYTES'
|
||||
'--dictionary-size=BYTES'
|
||||
Set the dictionary size limit in bytes. Plzip will use the smallest
|
||||
possible dictionary size for each file without exceeding this
|
||||
limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29
|
||||
are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
|
||||
that dictionary sizes are quantized. If the specified size does
|
||||
not match one of the valid sizes, it will be rounded upwards by
|
||||
adding up to (BYTES / 8) to it.
|
||||
When compressing, set the dictionary size limit in bytes. Plzip
|
||||
will use the smallest possible dictionary size for each file
|
||||
without exceeding this limit. Valid values range from 4 KiB to
|
||||
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
|
||||
the specified size does not match one of the valid sizes, it will
|
||||
be rounded upwards by adding up to (BYTES / 8) to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as
|
||||
large as possible, but keep in mind that the decompression memory
|
||||
|
@ -252,10 +300,10 @@ command line.
|
|||
|
||||
'-t'
|
||||
'--test'
|
||||
Check integrity of the specified file(s), but don't decompress
|
||||
them. This really performs a trial decompression and throws away
|
||||
the result. Use it together with '-v' to see information about
|
||||
the file(s). If a file does not exist, can't be opened, or is a
|
||||
Check integrity of the specified files, but don't decompress them.
|
||||
This really performs a trial decompression and throws away the
|
||||
result. Use it together with '-v' to see information about the
|
||||
files. If a file does not exist, can't be opened, or is a
|
||||
terminal, plzip continues checking the rest of the files. If a
|
||||
file fails the test, plzip may be unable to check the rest of the
|
||||
files.
|
||||
|
@ -263,17 +311,19 @@ command line.
|
|||
'-v'
|
||||
'--verbose'
|
||||
Verbose mode.
|
||||
When compressing, show the compression ratio for each file
|
||||
processed. A second '-v' shows the progress of compression.
|
||||
When compressing, show the compression ratio and size for each file
|
||||
processed.
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, dictionary
|
||||
size, decompressed size, and compressed size.
|
||||
Two or more '-v' options show the progress of (de)compression,
|
||||
except for single-member files.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression parameters (dictionary size and match length
|
||||
limit) as shown in the table below. The default compression level
|
||||
is '-6'. Note that '-9' can be much slower than '-0'. These
|
||||
options have no effect when decompressing.
|
||||
options have no effect when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a
|
||||
linear scale optimal for all files. If your files are large, very
|
||||
|
@ -296,6 +346,13 @@ command line.
|
|||
'--best'
|
||||
Aliases for GNU gzip compatibility.
|
||||
|
||||
'--loose-trailing'
|
||||
When decompressing, testing or listing, allow trailing data whose
|
||||
first bytes are so similar to the magic bytes of a lzip header
|
||||
that they can be confused with a corrupt header. Use this option
|
||||
if a file triggers a "corrupt header" error and the cause is not
|
||||
indeed a corrupt header.
|
||||
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
and an optional 'B' for "byte".
|
||||
|
@ -321,7 +378,7 @@ caused plzip to panic.
|
|||
|
||||
File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top
|
||||
|
||||
3 Program design
|
||||
4 Program design
|
||||
****************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
|
@ -344,6 +401,17 @@ them to the workers. The workers (de)compress the blocks received from
|
|||
the splitter. The muxer collects processed packets from the workers, and
|
||||
writes them to the output file.
|
||||
|
||||
,------------,
|
||||
,-->| worker 0 |--,
|
||||
| `------------' |
|
||||
,-------, ,----------, | ,------------, | ,-------, ,--------,
|
||||
| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
|
||||
| file | `----------' | `------------' | `-------' | file |
|
||||
`-------' | ... | `--------'
|
||||
| ,------------, |
|
||||
`-->| worker N-1 |--'
|
||||
`------------'
|
||||
|
||||
When decompressing from a regular file, the splitter is removed and
|
||||
the workers read directly from the input file. If the output file is
|
||||
also a regular file, the muxer is also removed and the workers write
|
||||
|
@ -355,7 +423,7 @@ I/O speed.
|
|||
|
||||
File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top
|
||||
|
||||
4 File format
|
||||
5 File format
|
||||
*************
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
|
@ -426,17 +494,11 @@ additional information before, between, or after them.
|
|||
|
||||
File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: File format, Up: Top
|
||||
|
||||
5 Memory required to compress and decompress
|
||||
6 Memory required to compress and decompress
|
||||
********************************************
|
||||
|
||||
The amount of memory required *per thread* is approximately the
|
||||
following:
|
||||
|
||||
* For compression at level -0; 1.5 MiB plus 3 times the data size
|
||||
(*note --data-size::). Default is 4.5 MiB.
|
||||
|
||||
* For compression at other levels; 11 times the dictionary size plus
|
||||
3 times the data size. Default is 136 MiB.
|
||||
The amount of memory required *per thread* for decompression or testing
|
||||
is approximately the following:
|
||||
|
||||
* For decompression of a regular (seekable) file to another regular
|
||||
file, or for testing of a regular file; the dictionary size.
|
||||
|
@ -450,10 +512,35 @@ following:
|
|||
* For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to 35 MiB.
|
||||
|
||||
The amount of memory required *per thread* for compression is
|
||||
approximately the following:
|
||||
|
||||
* For compression at level -0; 1.5 MiB plus 3.375 times the data size
|
||||
(*note --data-size::). Default is 4.875 MiB.
|
||||
|
||||
* For compression at other levels; 11 times the dictionary size plus
|
||||
3.375 times the data size. Default is 142 MiB.
|
||||
|
||||
The following table shows the memory required *per thread* for
|
||||
compression at a given level, using the default data size for each
|
||||
level:
|
||||
|
||||
Level Memory required
|
||||
-0 4.875 MiB
|
||||
-1 17.75 MiB
|
||||
-2 26.625 MiB
|
||||
-3 35.5 MiB
|
||||
-4 53.25 MiB
|
||||
-5 71 MiB
|
||||
-6 142 MiB
|
||||
-7 284 MiB
|
||||
-8 426 MiB
|
||||
-9 568 MiB
|
||||
|
||||
|
||||
File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory requirements, Up: Top
|
||||
|
||||
6 Minimum file sizes required for full compression speed
|
||||
7 Minimum file sizes required for full compression speed
|
||||
********************************************************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
|
@ -466,7 +553,8 @@ must be at least as large as the number of worker threads times the
|
|||
chunk size (*note --data-size::). Else some processors will not get any
|
||||
data to compress, and compression will be proportionally slower. The
|
||||
maximum speed increase achievable on a given file is limited by the
|
||||
ratio (file_size / data_size).
|
||||
ratio (file_size / data_size). For example, a tarball the size of gcc or
|
||||
linux will scale up to 8 processors at level -9.
|
||||
|
||||
The following table shows the minimum uncompressed file size needed
|
||||
for full use of N processors at a given compression level, using the
|
||||
|
@ -489,7 +577,7 @@ Level
|
|||
|
||||
File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file sizes, Up: Top
|
||||
|
||||
7 Extra data appended to the file
|
||||
8 Extra data appended to the file
|
||||
*********************************
|
||||
|
||||
Sometimes extra data are found appended to a lzip file after the last
|
||||
|
@ -501,10 +589,11 @@ member. Such trailing data may be:
|
|||
|
||||
* Useful data added by the user; a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount
|
||||
of text to a lzip file as long as the text does not begin with the
|
||||
string "LZIP", and does not contain any zero bytes (null
|
||||
characters). Nonzero bytes and zero bytes can't be safely mixed in
|
||||
trailing data.
|
||||
of text to a lzip file as long as none of the first four bytes of
|
||||
the text match the corresponding byte in the string "LZIP", and
|
||||
the text does not contain any zero bytes (null characters).
|
||||
Nonzero bytes and zero bytes can't be safely mixed in trailing
|
||||
data.
|
||||
|
||||
* Garbage added by some not totally successful copy operation.
|
||||
|
||||
|
@ -512,12 +601,17 @@ member. Such trailing data may be:
|
|||
and hash value (for a chosen hash) coincide with those of another
|
||||
file.
|
||||
|
||||
* In very rare cases, trailing data could be the corrupt header of
|
||||
another member. In multimember or concatenated files the
|
||||
probability of corruption happening in the magic bytes is 5 times
|
||||
smaller than the probability of getting a false positive caused by
|
||||
the corruption of the integrity information itself. Therefore it
|
||||
can be considered to be below the noise level.
|
||||
* In rare cases, trailing data could be the corrupt header of another
|
||||
member. In multimember or concatenated files the probability of
|
||||
corruption happening in the magic bytes is 5 times smaller than the
|
||||
probability of getting a false positive caused by the corruption
|
||||
of the integrity information itself. Therefore it can be
|
||||
considered to be below the noise level. Additionally, the test
|
||||
used by plzip to discriminate trailing data from a corrupt header
|
||||
has a Hamming distance (HD) of 3, and the 3 bit flips must happen
|
||||
in different magic bytes for the test to fail. In any case, the
|
||||
option '--trailing-error' guarantees that any corrupt header will
|
||||
be detected.
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
reading lzip files are expected to behave as correctly and usefully as
|
||||
|
@ -531,7 +625,7 @@ cases where a file containing trailing data must be rejected, the option
|
|||
|
||||
File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top
|
||||
|
||||
8 A small tutorial with examples
|
||||
9 A small tutorial with examples
|
||||
********************************
|
||||
|
||||
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
|
||||
|
@ -595,8 +689,8 @@ to decompressed byte 15000 (5000 bytes are produced).
|
|||
|
||||
File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
9 Reporting bugs
|
||||
****************
|
||||
10 Reporting bugs
|
||||
*****************
|
||||
|
||||
There are probably bugs in plzip. There are certainly errors and
|
||||
omissions in this manual. If you report them, they will get fixed. If
|
||||
|
@ -625,6 +719,7 @@ Concept index
|
|||
* memory requirements: Memory requirements. (line 6)
|
||||
* minimum file sizes: Minimum file sizes. (line 6)
|
||||
* options: Invoking plzip. (line 6)
|
||||
* output: Output. (line 6)
|
||||
* program design: Program design. (line 6)
|
||||
* trailing data: Trailing data. (line 6)
|
||||
* usage: Invoking plzip. (line 6)
|
||||
|
@ -634,19 +729,20 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top221
|
||||
Node: Introduction1103
|
||||
Node: Invoking plzip5274
|
||||
Ref: --trailing-error5843
|
||||
Ref: --data-size6086
|
||||
Node: Program design12796
|
||||
Node: File format14383
|
||||
Node: Memory requirements16815
|
||||
Node: Minimum file sizes17815
|
||||
Node: Trailing data19741
|
||||
Node: Examples21648
|
||||
Ref: concat-example22813
|
||||
Node: Problems23388
|
||||
Node: Concept index23914
|
||||
Node: Introduction1158
|
||||
Node: Output5134
|
||||
Node: Invoking plzip6614
|
||||
Ref: --trailing-error7177
|
||||
Ref: --data-size7420
|
||||
Node: Program design14938
|
||||
Node: File format17090
|
||||
Node: Memory requirements19522
|
||||
Node: Minimum file sizes20985
|
||||
Node: Trailing data23002
|
||||
Node: Examples25285
|
||||
Ref: concat-example26450
|
||||
Node: Problems27025
|
||||
Node: Concept index27553
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
235
doc/plzip.texi
235
doc/plzip.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 12 April 2017
|
||||
@set VERSION 1.6
|
||||
@set UPDATED 7 February 2018
|
||||
@set VERSION 1.7
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -36,6 +36,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
|
|||
|
||||
@menu
|
||||
* Introduction:: Purpose and features of plzip
|
||||
* Output:: Meaning of plzip's output
|
||||
* Invoking plzip:: Command line interface
|
||||
* Program design:: Internal structure of plzip
|
||||
* File format:: Detailed format of the compressed file
|
||||
|
@ -48,7 +49,7 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2009-2017 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2009-2018 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission
|
||||
to copy, distribute and modify it.
|
||||
|
@ -81,7 +82,7 @@ availability:
|
|||
The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The
|
||||
@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
|
||||
program can repair bit-flip errors (one of the most common forms of data
|
||||
program can repair bit flip errors (one of the most common forms of data
|
||||
corruption) in lzip files, and provides data recovery capabilities,
|
||||
including error-checked merging of damaged copies of a file.
|
||||
@ifnothtml
|
||||
|
@ -143,9 +144,54 @@ incomprehensible and therefore pointless.
|
|||
|
||||
Plzip will correctly decompress a file which is the concatenation of two
|
||||
or more compressed files. The result is the concatenation of the
|
||||
corresponding uncompressed files. Integrity testing of concatenated
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
|
||||
|
||||
@node Output
|
||||
@chapter Meaning of plzip's output
|
||||
@cindex output
|
||||
|
||||
The output of plzip looks like this:
|
||||
|
||||
@example
|
||||
plzip -v foo
|
||||
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
|
||||
|
||||
plzip -tvv foo.lz
|
||||
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
|
||||
@end example
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
@table @code
|
||||
@item N:1
|
||||
The compression ratio @w{(uncompressed_size / compressed_size)}, shown
|
||||
as N to 1.
|
||||
|
||||
@item ratio
|
||||
The inverse compression ratio @w{(compressed_size / uncompressed_size)},
|
||||
shown as a percentage. A decimal ratio is easily obtained by moving the
|
||||
decimal point two places to the left; @w{14.98% = 0.1498}.
|
||||
|
||||
@item saved
|
||||
The space saved by compression @w{(1 - ratio)}, shown as a percentage.
|
||||
|
||||
@item in
|
||||
The size of the uncompressed data. When decompressing or testing, it is
|
||||
shown as @code{decompressed}. Note that plzip always prints the
|
||||
uncompressed size before the compressed size when compressing,
|
||||
decompressing, testing or listing.
|
||||
|
||||
@item out
|
||||
The size of the compressed data. When decompressing or testing, it is
|
||||
shown as @code{compressed}.
|
||||
|
||||
@end table
|
||||
|
||||
When decompressing or testing at verbosity level 4 (-vvvv), the
|
||||
dictionary size used to compress the file is also shown.
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
|
||||
have been compressed. Decompressed is used to refer to data which have
|
||||
undergone the process of decompression.
|
||||
|
@ -169,7 +215,7 @@ plzip [@var{options}] [@var{files}]
|
|||
mixed with other @var{files} and is read just once, the first time it
|
||||
appears in the command line.
|
||||
|
||||
Plzip supports the following options:
|
||||
plzip supports the following options:
|
||||
|
||||
@table @code
|
||||
@item -h
|
||||
|
@ -190,12 +236,12 @@ garbage that can be safely ignored. @xref{concat-example}.
|
|||
@anchor{--data-size}
|
||||
@item -B @var{bytes}
|
||||
@itemx --data-size=@var{bytes}
|
||||
Set the size of the input data blocks, in bytes. The input file will be
|
||||
divided in chunks of this size before compression is performed. Valid
|
||||
values range from 8 KiB to 1 GiB. Default value is two times the
|
||||
dictionary size, except for option @samp{-0} where it defaults to 1 MiB.
|
||||
Plzip will reduce the dictionary size if it is larger than the chosen
|
||||
data size.
|
||||
When compressing, set the size of the input data blocks in bytes. The
|
||||
input file will be divided in chunks of this size before compression is
|
||||
performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value
|
||||
is two times the dictionary size, except for option @samp{-0} where it
|
||||
defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is
|
||||
larger than the chosen data size.
|
||||
|
||||
@item -c
|
||||
@itemx --stdout
|
||||
|
@ -206,10 +252,10 @@ device.
|
|||
|
||||
@item -d
|
||||
@itemx --decompress
|
||||
Decompress the specified file(s). If a file does not exist or can't be
|
||||
Decompress the specified files. If a file does not exist or can't be
|
||||
opened, plzip continues decompressing the rest of the files. If a file
|
||||
fails to decompress, plzip exits immediately without decompressing the
|
||||
rest of the files.
|
||||
fails to decompress, or is a terminal, plzip exits immediately without
|
||||
decompressing the rest of the files.
|
||||
|
||||
@item -f
|
||||
@itemx --force
|
||||
|
@ -217,8 +263,8 @@ Force overwrite of output files.
|
|||
|
||||
@item -F
|
||||
@itemx --recompress
|
||||
Force re-compression of files whose name already has the @samp{.lz} or
|
||||
@samp{.tlz} suffix.
|
||||
When compressing, force re-compression of files whose name already has
|
||||
the @samp{.lz} or @samp{.tlz} suffix.
|
||||
|
||||
@item -k
|
||||
@itemx --keep
|
||||
|
@ -227,7 +273,7 @@ Keep (don't delete) input files during compression or decompression.
|
|||
@item -l
|
||||
@itemx --list
|
||||
Print the uncompressed size, compressed size and percentage saved of the
|
||||
specified file(s). Trailing data are ignored. The values produced are
|
||||
specified files. Trailing data are ignored. The values produced are
|
||||
correct even for multimember files. If more than one file is given, a
|
||||
final line containing the cumulative sizes is printed. With @samp{-v},
|
||||
the dictionary size, the number of members in the file, and the amount
|
||||
|
@ -240,16 +286,21 @@ verifies that none of the specified files contain trailing data.
|
|||
|
||||
@item -m @var{bytes}
|
||||
@itemx --match-length=@var{bytes}
|
||||
Set the match length limit in bytes. After a match this long is found,
|
||||
the search is finished. Valid values range from 5 to 273. Larger values
|
||||
usually give better compression ratios but longer compression times.
|
||||
When compressing, set the match length limit in bytes. After a match
|
||||
this long is found, the search is finished. Valid values range from 5 to
|
||||
273. Larger values usually give better compression ratios but longer
|
||||
compression times.
|
||||
|
||||
@item -n @var{n}
|
||||
@itemx --threads=@var{n}
|
||||
Set the number of worker threads. Valid values range from 1 to "as many
|
||||
as your system can support". If this option is not used, plzip tries to
|
||||
detect the number of processors in the system and use it as default
|
||||
value. @w{@samp{plzip --help}} shows the system's default value.
|
||||
Set the number of worker threads, overriding the system's default. Valid
|
||||
values range from 1 to "as many as your system can support". If this
|
||||
option is not used, plzip tries to detect the number of processors in
|
||||
the system and use it as default value. When compressing on a @w{32 bit}
|
||||
system, plzip tries to limit the memory use to under @w{2.22 GiB} (4
|
||||
worker threads at level -9) by reducing the number of threads below the
|
||||
system's default. @w{@samp{plzip --help}} shows the system's default
|
||||
value.
|
||||
|
||||
Note that the number of usable threads is limited to @w{ceil( file_size
|
||||
/ data_size )} during compression (@pxref{Minimum file sizes}), and to
|
||||
|
@ -260,7 +311,9 @@ the number of members in the input during decompression.
|
|||
When reading from standard input and @samp{--stdout} has not been
|
||||
specified, use @samp{@var{file}} as the virtual name of the uncompressed
|
||||
file. This produces a file named @samp{@var{file}} when decompressing,
|
||||
and a file named @samp{@var{file}.lz} when compressing.
|
||||
or a file named @samp{@var{file}.lz} when compressing. A second
|
||||
@samp{.lz} extension is not added if @samp{@var{file}} already ends in
|
||||
@samp{.lz} or @samp{.tlz}.
|
||||
|
||||
@item -q
|
||||
@itemx --quiet
|
||||
|
@ -268,12 +321,12 @@ Quiet operation. Suppress all messages.
|
|||
|
||||
@item -s @var{bytes}
|
||||
@itemx --dictionary-size=@var{bytes}
|
||||
Set the dictionary size limit in bytes. Plzip will use the smallest
|
||||
possible dictionary size for each file without exceeding this limit.
|
||||
Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are
|
||||
interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that
|
||||
dictionary sizes are quantized. If the specified size does not match one
|
||||
of the valid sizes, it will be rounded upwards by adding up to
|
||||
When compressing, set the dictionary size limit in bytes. Plzip will use
|
||||
the smallest possible dictionary size for each file without exceeding
|
||||
this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
|
||||
to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
|
||||
that dictionary sizes are quantized. If the specified size does not
|
||||
match one of the valid sizes, it will be rounded upwards by adding up to
|
||||
@w{(@var{bytes} / 8)} to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as large
|
||||
|
@ -282,27 +335,29 @@ is affected at compression time by the choice of dictionary size limit.
|
|||
|
||||
@item -t
|
||||
@itemx --test
|
||||
Check integrity of the specified file(s), but don't decompress them.
|
||||
This really performs a trial decompression and throws away the result.
|
||||
Use it together with @samp{-v} to see information about the file(s). If
|
||||
a file does not exist, can't be opened, or is a terminal, plzip
|
||||
continues checking the rest of the files. If a file fails the test,
|
||||
plzip may be unable to check the rest of the files.
|
||||
Check integrity of the specified files, but don't decompress them. This
|
||||
really performs a trial decompression and throws away the result. Use it
|
||||
together with @samp{-v} to see information about the files. If a file
|
||||
does not exist, can't be opened, or is a terminal, plzip continues
|
||||
checking the rest of the files. If a file fails the test, plzip may be
|
||||
unable to check the rest of the files.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
Verbose mode.@*
|
||||
When compressing, show the compression ratio for each file processed. A
|
||||
second @samp{-v} shows the progress of compression.@*
|
||||
When compressing, show the compression ratio and size for each file
|
||||
processed.@*
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, dictionary size,
|
||||
decompressed size, and compressed size.
|
||||
decompressed size, and compressed size.@*
|
||||
Two or more @samp{-v} options show the progress of (de)compression,
|
||||
except for single-member files.
|
||||
|
||||
@item -0 .. -9
|
||||
Set the compression parameters (dictionary size and match length limit)
|
||||
as shown in the table below. The default compression level is @samp{-6}.
|
||||
Note that @samp{-9} can be much slower than @samp{-0}. These options
|
||||
have no effect when decompressing.
|
||||
have no effect when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
|
@ -327,6 +382,12 @@ etc, you may need to use the @samp{--dictionary-size} and
|
|||
@itemx --best
|
||||
Aliases for GNU gzip compatibility.
|
||||
|
||||
@item --loose-trailing
|
||||
When decompressing, testing or listing, allow trailing data whose first
|
||||
bytes are so similar to the magic bytes of a lzip header that they can
|
||||
be confused with a corrupt header. Use this option if a file triggers a
|
||||
"corrupt header" error and the cause is not indeed a corrupt header.
|
||||
|
||||
@end table
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
|
@ -363,8 +424,8 @@ creating a multimember compressed file.
|
|||
|
||||
When decompressing, plzip decompresses as many members simultaneously as
|
||||
worker threads are chosen. Files that were compressed with lzip will not
|
||||
be decompressed faster than using lzip (unless the @samp{-b} option was
|
||||
used) because lzip usually produces single-member files, which can't be
|
||||
be decompressed faster than using lzip (unless the @samp{-b} option was used)
|
||||
because lzip usually produces single-member files, which can't be
|
||||
decompressed in parallel.
|
||||
|
||||
For each input file, a splitter thread and several worker threads are
|
||||
|
@ -377,6 +438,19 @@ to the workers. The workers (de)compress the blocks received from the
|
|||
splitter. The muxer collects processed packets from the workers, and
|
||||
writes them to the output file.
|
||||
|
||||
@verbatim
|
||||
,------------,
|
||||
,-->| worker 0 |--,
|
||||
| `------------' |
|
||||
,-------, ,----------, | ,------------, | ,-------, ,--------,
|
||||
| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
|
||||
| file | `----------' | `------------' | `-------' | file |
|
||||
`-------' | ... | `--------'
|
||||
| ,------------, |
|
||||
`-->| worker N-1 |--'
|
||||
`------------'
|
||||
@end verbatim
|
||||
|
||||
When decompressing from a regular file, the splitter is removed and the
|
||||
workers read directly from the input file. If the output file is also a
|
||||
regular file, the muxer is also removed and the workers write directly
|
||||
|
@ -472,35 +546,60 @@ facilitates safe recovery of undamaged members from multimember files.
|
|||
@chapter Memory required to compress and decompress
|
||||
@cindex memory requirements
|
||||
|
||||
The amount of memory required @strong{per thread} is approximately the
|
||||
following:
|
||||
The amount of memory required @strong{per thread} for decompression or
|
||||
testing is approximately the following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
For compression at level -0; 1.5 MiB plus 3 times the data size
|
||||
(@pxref{--data-size}). Default is 4.5 MiB.
|
||||
|
||||
@item
|
||||
For compression at other levels; 11 times the dictionary size plus 3
|
||||
times the data size. Default is 136 MiB.
|
||||
|
||||
@item
|
||||
For decompression of a regular (seekable) file to another regular file,
|
||||
or for testing of a regular file; the dictionary size.
|
||||
|
||||
@item
|
||||
For testing of a non-seekable file or of standard input; the dictionary
|
||||
size plus up to 5 MiB.
|
||||
size plus up to @w{5 MiB}.
|
||||
|
||||
@item
|
||||
For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to 32 MiB.
|
||||
standard output; the dictionary size plus up to @w{32 MiB}.
|
||||
|
||||
@item
|
||||
For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus up to 35 MiB.
|
||||
dictionary size plus up to @w{35 MiB}.
|
||||
@end itemize
|
||||
|
||||
@noindent
|
||||
The amount of memory required @strong{per thread} for compression is
|
||||
approximately the following:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size
|
||||
(@pxref{--data-size}). Default is @w{4.875 MiB}.
|
||||
|
||||
@item
|
||||
For compression at other levels; 11 times the dictionary size plus 3.375
|
||||
times the data size. Default is @w{142 MiB}.
|
||||
@end itemize
|
||||
|
||||
@noindent
|
||||
The following table shows the memory required @strong{per thread} for
|
||||
compression at a given level, using the default data size for each
|
||||
level:
|
||||
|
||||
@multitable {Level} {Memory required}
|
||||
@item Level @tab Memory required
|
||||
@item -0 @tab 4.875 MiB
|
||||
@item -1 @tab 17.75 MiB
|
||||
@item -2 @tab 26.625 MiB
|
||||
@item -3 @tab 35.5 MiB
|
||||
@item -4 @tab 53.25 MiB
|
||||
@item -5 @tab 71 MiB
|
||||
@item -6 @tab 142 MiB
|
||||
@item -7 @tab 284 MiB
|
||||
@item -8 @tab 426 MiB
|
||||
@item -9 @tab 568 MiB
|
||||
@end multitable
|
||||
|
||||
|
||||
@node Minimum file sizes
|
||||
@chapter Minimum file sizes required for full compression speed
|
||||
|
@ -516,7 +615,8 @@ least as large as the number of worker threads times the chunk size
|
|||
(@pxref{--data-size}). Else some processors will not get any data to
|
||||
compress, and compression will be proportionally slower. The maximum
|
||||
speed increase achievable on a given file is limited by the ratio
|
||||
@w{(file_size / data_size)}.
|
||||
@w{(file_size / data_size)}. For example, a tarball the size of gcc or
|
||||
linux will scale up to 8 processors at level -9.
|
||||
|
||||
The following table shows the minimum uncompressed file size needed for
|
||||
full use of N processors at a given compression level, using the default
|
||||
|
@ -554,9 +654,10 @@ padding zero bytes to a lzip file.
|
|||
@item
|
||||
Useful data added by the user; a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount of
|
||||
text to a lzip file as long as the text does not begin with the string
|
||||
"LZIP", and does not contain any zero bytes (null characters). Nonzero
|
||||
bytes and zero bytes can't be safely mixed in trailing data.
|
||||
text to a lzip file as long as none of the first four bytes of the text
|
||||
match the corresponding byte in the string "LZIP", and the text does not
|
||||
contain any zero bytes (null characters). Nonzero bytes and zero bytes
|
||||
can't be safely mixed in trailing data.
|
||||
|
||||
@item
|
||||
Garbage added by some not totally successful copy operation.
|
||||
|
@ -566,12 +667,16 @@ Malicious data added to the file in order to make its total size and
|
|||
hash value (for a chosen hash) coincide with those of another file.
|
||||
|
||||
@item
|
||||
In very rare cases, trailing data could be the corrupt header of another
|
||||
In rare cases, trailing data could be the corrupt header of another
|
||||
member. In multimember or concatenated files the probability of
|
||||
corruption happening in the magic bytes is 5 times smaller than the
|
||||
probability of getting a false positive caused by the corruption of the
|
||||
integrity information itself. Therefore it can be considered to be below
|
||||
the noise level.
|
||||
the noise level. Additionally, the test used by plzip to discriminate
|
||||
trailing data from a corrupt header has a Hamming distance (HD) of 3,
|
||||
and the 3 bit flips must happen in different magic bytes for the test to
|
||||
fail. In any case, the option @samp{--trailing-error} guarantees that
|
||||
any corrupt header will be detected.
|
||||
@end itemize
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
|
@ -607,7 +712,7 @@ plzip -v file
|
|||
@sp 1
|
||||
@noindent
|
||||
Example 2: Like example 1 but the created @samp{file.lz} has a block
|
||||
size of 1 MiB. The compression ratio is not shown.
|
||||
size of @w{1 MiB}. The compression ratio is not shown.
|
||||
|
||||
@example
|
||||
plzip -B 1MiB file
|
||||
|
@ -656,7 +761,7 @@ Do this instead
|
|||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 7: Decompress @samp{file.lz} partially until 10 KiB of
|
||||
Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of
|
||||
decompressed data are produced.
|
||||
|
||||
@example
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue