Merging upstream version 1.11.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
2b58741015
commit
648618884e
21 changed files with 727 additions and 631 deletions
234
doc/plzip.info
234
doc/plzip.info
|
@ -11,13 +11,13 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Plzip Manual
|
||||
************
|
||||
|
||||
This manual is for Plzip (version 1.10, 24 January 2022).
|
||||
This manual is for Plzip (version 1.11, 21 January 2024).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of plzip
|
||||
* Output:: Meaning of plzip's output
|
||||
* Invoking plzip:: Command line interface
|
||||
* Invoking plzip:: Command-line interface
|
||||
* Program design:: Internal structure of plzip
|
||||
* Memory requirements:: Memory required to compress and decompress
|
||||
* Minimum file sizes:: Minimum file sizes required for full speed
|
||||
|
@ -28,7 +28,7 @@ This manual is for Plzip (version 1.10, 24 January 2022).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2009-2022 Antonio Diaz Diaz.
|
||||
Copyright (C) 2009-2024 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
@ -39,19 +39,20 @@ File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip, fully
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip,
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
|
||||
checking to maximize interoperability and optimize safety. Lzip can compress
|
||||
about as fast as gzip (lzip -0) or compress most files more than bzip2
|
||||
(lzip -9). Decompression speed is intermediate between gzip and bzip2. Lzip
|
||||
is better than gzip and bzip2 from a data recovery perspective. Lzip has
|
||||
been designed, written, and tested with great care to replace gzip and
|
||||
bzip2 as the standard general-purpose compressed format for unix-like
|
||||
systems.
|
||||
chain-Algorithm' (LZMA) stream format to maximize interoperability. The
|
||||
maximum dictionary size is 512 MiB so that any lzip file can be decompressed
|
||||
on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
|
||||
checking. Lzip can compress about as fast as gzip (lzip -0) or compress most
|
||||
files more than bzip2 (lzip -9). Decompression speed is intermediate between
|
||||
gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
|
||||
perspective. Lzip has been designed, written, and tested with great care to
|
||||
replace gzip and bzip2 as the standard general-purpose compressed format for
|
||||
Unix-like systems.
|
||||
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
|
@ -94,10 +95,10 @@ byte near the beginning is a thing of the past.
|
|||
makes it safer than compressors returning ambiguous warning values (like
|
||||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Plzip will automatically use for each file the largest dictionary size
|
||||
that does not exceed neither the file size nor the limit given. Keep in
|
||||
mind that the decompression memory requirement is affected at compression
|
||||
time by the choice of dictionary size limit. *Note Memory requirements::.
|
||||
Plzip automatically uses for each file the largest dictionary size that
|
||||
does not exceed neither the file size nor the limit given. Keep in mind
|
||||
that the decompression memory requirement is affected at compression time
|
||||
by the choice of dictionary size limit. *Note Memory requirements::.
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz". When
|
||||
|
@ -109,22 +110,22 @@ filename.tlz becomes filename.tar
|
|||
anyothername becomes anyothername.out
|
||||
|
||||
(De)compressing a file is much like copying or moving it. Therefore plzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as 'cp -p' does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
preserves the access and modification dates, permissions, and, if you have
|
||||
appropriate privileges, ownership of the file just as 'cp -p' does. (If the
|
||||
user ID or the group ID can't be duplicated, the file permission bits
|
||||
S_ISUID and S_ISGID are cleared).
|
||||
|
||||
Plzip is able to read from some types of non-regular files if either the
|
||||
option '-c' or the option '-o' is specified.
|
||||
|
||||
Plzip will refuse to read compressed data from a terminal or write
|
||||
compressed data to a terminal, as this would be entirely incomprehensible
|
||||
and might leave the terminal in an abnormal state.
|
||||
Plzip refuses to read compressed data from a terminal or write compressed
|
||||
data to a terminal, as this would be entirely incomprehensible and might
|
||||
leave the terminal in an abnormal state.
|
||||
|
||||
Plzip will correctly decompress a file which is the concatenation of two
|
||||
or more compressed files. The result is the concatenation of the
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
Plzip correctly decompresses a file which is the concatenation of two or
|
||||
more compressed files. The result is the concatenation of the corresponding
|
||||
decompressed files. Integrity testing of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
|
||||
File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top
|
||||
|
@ -185,7 +186,8 @@ The format for running plzip is:
|
|||
If no file names are specified, plzip compresses (or decompresses) from
|
||||
standard input to standard output. A hyphen '-' used as a FILE argument
|
||||
means standard input. It can be mixed with other FILES and is read just
|
||||
once, the first time it appears in the command line.
|
||||
once, the first time it appears in the command line. Remember to prepend
|
||||
'./' to any file name beginning with a hyphen, or use '--'.
|
||||
|
||||
plzip supports the following options: *Note Argument syntax:
|
||||
(arg_parser)Argument syntax.
|
||||
|
@ -208,30 +210,32 @@ once, the first time it appears in the command line.
|
|||
'-B BYTES'
|
||||
'--data-size=BYTES'
|
||||
When compressing, set the size in bytes of the input data blocks. The
|
||||
input file will be divided in chunks of this size before compression is
|
||||
input file is divided in chunks of this size before compression is
|
||||
performed. Valid values range from 8 KiB to 1 GiB. Default value is
|
||||
two times the dictionary size, except for option '-0' where it
|
||||
defaults to 1 MiB. Plzip will reduce the dictionary size if it is
|
||||
larger than the data size specified. *Note Minimum file sizes::.
|
||||
defaults to 1 MiB. Plzip reduces the dictionary size if it is larger
|
||||
than the data size specified. *Note Minimum file sizes::.
|
||||
|
||||
'-c'
|
||||
'--stdout'
|
||||
Compress or decompress to standard output; keep input files unchanged.
|
||||
If compressing several files, each file is compressed independently.
|
||||
This option (or '-o') is needed when reading from a named pipe (fifo)
|
||||
or from a device. Use 'lziprecover -cd -i' to recover as much of the
|
||||
decompressed data as possible when decompressing a corrupt file. '-c'
|
||||
overrides '-o'. '-c' has no effect when testing or listing.
|
||||
(The output consists of a sequence of independently compressed
|
||||
members). This option (or '-o') is needed when reading from a named
|
||||
pipe (fifo) or from a device. Use 'lziprecover -cd -i' to recover as
|
||||
much of the decompressed data as possible when decompressing a corrupt
|
||||
file. '-c' overrides '-o'. '-c' has no effect when testing or listing.
|
||||
|
||||
'-d'
|
||||
'--decompress'
|
||||
Decompress the files specified. If a file does not exist, can't be
|
||||
opened, or the destination file already exists and '--force' has not
|
||||
been specified, plzip continues decompressing the rest of the files
|
||||
and exits with error status 1. If a file fails to decompress, or is a
|
||||
terminal, plzip exits immediately with error status 2 without
|
||||
decompressing the rest of the files. A terminal is considered an
|
||||
uncompressed file, and therefore invalid.
|
||||
Decompress the files specified. The integrity of the files specified is
|
||||
checked. If a file does not exist, can't be opened, or the destination
|
||||
file already exists and '--force' has not been specified, plzip
|
||||
continues decompressing the rest of the files and exits with error
|
||||
status 1. If a file fails to decompress, or is a terminal, plzip exits
|
||||
immediately with error status 2 without decompressing the rest of the
|
||||
files. A terminal is considered an uncompressed file, and therefore
|
||||
invalid.
|
||||
|
||||
'-f'
|
||||
'--force'
|
||||
|
@ -258,18 +262,18 @@ once, the first time it appears in the command line.
|
|||
printed.
|
||||
|
||||
If any file is damaged, does not exist, can't be opened, or is not
|
||||
regular, the final exit status will be > 0. '-lq' can be used to verify
|
||||
regular, the final exit status is > 0. '-lq' can be used to check
|
||||
quickly (without decompressing) the structural integrity of the files
|
||||
specified. (Use '--test' to verify the data integrity). '-alq'
|
||||
additionally verifies that none of the files specified contain
|
||||
trailing data.
|
||||
specified. (Use '--test' to check the data integrity). '-alq'
|
||||
additionally checks that none of the files specified contain trailing
|
||||
data.
|
||||
|
||||
'-m BYTES'
|
||||
'--match-length=BYTES'
|
||||
When compressing, set the match length limit in bytes. After a match
|
||||
this long is found, the search is finished. Valid values range from 5
|
||||
to 273. Larger values usually give better compression ratios but longer
|
||||
compression times.
|
||||
to 273. Larger values usually give better compression ratios but
|
||||
longer compression times.
|
||||
|
||||
'-n N'
|
||||
'--threads=N'
|
||||
|
@ -291,10 +295,12 @@ once, the first time it appears in the command line.
|
|||
|
||||
'-o FILE'
|
||||
'--output=FILE'
|
||||
If '-c' has not been also specified, write the (de)compressed output to
|
||||
FILE; keep input files unchanged. If compressing several files, each
|
||||
file is compressed independently. This option (or '-c') is needed when
|
||||
reading from a named pipe (fifo) or from a device. '-o -' is
|
||||
If '-c' has not been also specified, write the (de)compressed output
|
||||
to FILE, automatically creating any missing parent directories; keep
|
||||
input files unchanged. If compressing several files, each file is
|
||||
compressed independently. (The output consists of a sequence of
|
||||
independently compressed members). This option (or '-c') is needed
|
||||
when reading from a named pipe (fifo) or from a device. '-o -' is
|
||||
equivalent to '-c'. '-o' has no effect when testing or listing.
|
||||
|
||||
In order to keep backward compatibility with plzip versions prior to
|
||||
|
@ -311,14 +317,14 @@ once, the first time it appears in the command line.
|
|||
|
||||
'-s BYTES'
|
||||
'--dictionary-size=BYTES'
|
||||
When compressing, set the dictionary size limit in bytes. Plzip will
|
||||
use for each file the largest dictionary size that does not exceed
|
||||
neither the file size nor this limit. Valid values range from 4 KiB to
|
||||
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
|
||||
coded in just one byte (*note coded-dict-size::). If the size specified
|
||||
does not match one of the valid sizes, it will be rounded upwards by
|
||||
adding up to (BYTES / 8) to it.
|
||||
When compressing, set the dictionary size limit in bytes. Plzip uses
|
||||
for each file the largest dictionary size that does not exceed neither
|
||||
the file size nor this limit. Valid values range from 4 KiB to 512 MiB.
|
||||
Values 12 to 29 are interpreted as powers of two, meaning 2^12 to 2^29
|
||||
bytes. Dictionary sizes are quantized so that they can be coded in
|
||||
just one byte (*note coded-dict-size::). If the size specified does
|
||||
not match one of the valid sizes, it is rounded upwards by adding up
|
||||
to (BYTES / 8) to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as large
|
||||
as possible, but keep in mind that the decompression memory requirement
|
||||
|
@ -330,7 +336,7 @@ once, the first time it appears in the command line.
|
|||
really performs a trial decompression and throws away the result. Use
|
||||
it together with '-v' to see information about the files. If a file
|
||||
fails the test, does not exist, can't be opened, or is a terminal,
|
||||
plzip continues checking the rest of the files. A final diagnostic is
|
||||
plzip continues testing the rest of the files. A final diagnostic is
|
||||
shown at verbosity level 1 or higher if any file fails the test when
|
||||
testing multiple files.
|
||||
|
||||
|
@ -408,26 +414,29 @@ once, the first time it appears in the command line.
|
|||
(lzlib)Library version.
|
||||
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
and an optional 'B' for "byte".
|
||||
Numbers given as arguments to options may be expressed in decimal,
|
||||
hexadecimal, or octal (using the same syntax as integer constants in C++),
|
||||
and may be followed by a multiplier and an optional 'B' for "byte".
|
||||
|
||||
Table of SI and binary prefixes (unit multipliers):
|
||||
|
||||
Prefix Value | Prefix Value
|
||||
k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024)
|
||||
M megabyte (10^6) | Mi mebibyte (2^20)
|
||||
G gigabyte (10^9) | Gi gibibyte (2^30)
|
||||
T terabyte (10^12) | Ti tebibyte (2^40)
|
||||
P petabyte (10^15) | Pi pebibyte (2^50)
|
||||
E exabyte (10^18) | Ei exbibyte (2^60)
|
||||
Z zettabyte (10^21) | Zi zebibyte (2^70)
|
||||
Y yottabyte (10^24) | Yi yobibyte (2^80)
|
||||
Prefix Value | Prefix Value
|
||||
k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024)
|
||||
M megabyte (10^6) | Mi mebibyte (2^20)
|
||||
G gigabyte (10^9) | Gi gibibyte (2^30)
|
||||
T terabyte (10^12) | Ti tebibyte (2^40)
|
||||
P petabyte (10^15) | Pi pebibyte (2^50)
|
||||
E exabyte (10^18) | Ei exbibyte (2^60)
|
||||
Z zettabyte (10^21) | Zi zebibyte (2^70)
|
||||
Y yottabyte (10^24) | Yi yobibyte (2^80)
|
||||
R ronnabyte (10^27) | Ri robibyte (2^90)
|
||||
Q quettabyte (10^30) | Qi quebibyte (2^100)
|
||||
|
||||
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file not
|
||||
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
|
||||
input file, 3 for an internal consistency error (e.g., bug) which caused
|
||||
plzip to panic.
|
||||
found, invalid command-line options, I/O errors, etc), 2 to indicate a
|
||||
corrupt or invalid input file, 3 for an internal consistency error (e.g.,
|
||||
bug) which caused plzip to panic.
|
||||
|
||||
|
||||
File: plzip.info, Node: Program design, Next: Memory requirements, Prev: Invoking plzip, Up: Top
|
||||
|
@ -441,7 +450,7 @@ multimember compressed file. Each chunk is compressed in-place (using the
|
|||
same buffer for input and output), reducing the amount of RAM required.
|
||||
|
||||
When decompressing, plzip decompresses as many members simultaneously as
|
||||
worker threads are chosen. Files that were compressed with lzip will not be
|
||||
worker threads are chosen. Files that were compressed with lzip are not
|
||||
decompressed faster than using lzip (unless the option '-b' was used)
|
||||
because lzip usually produces single-member files, which can't be
|
||||
decompressed in parallel.
|
||||
|
@ -535,10 +544,10 @@ multimember compressed file.
|
|||
For this to work as expected (and roughly multiply the compression speed
|
||||
by the number of available processors), the uncompressed file must be at
|
||||
least as large as the number of worker threads times the chunk size (*note
|
||||
--data-size::). Else some processors will not get any data to compress, and
|
||||
compression will be proportionally slower. The maximum speed increase
|
||||
achievable on a given file is limited by the ratio (file_size / data_size).
|
||||
For example, a tarball the size of gcc or linux will scale up to 10 or 14
|
||||
--data-size::). Else some processors do not get any data to compress, and
|
||||
compression is proportionally slower. The maximum speed increase achievable
|
||||
on a given file is limited by the ratio (file_size / data_size). For
|
||||
example, a tarball the size of gcc or linux scales up to 10 or 14
|
||||
processors at level -9.
|
||||
|
||||
The following table shows the minimum uncompressed file size needed for
|
||||
|
@ -585,7 +594,7 @@ when there is no longer anything to take away.
|
|||
represents a variable number of bytes.
|
||||
|
||||
|
||||
A lzip file consists of a series of independent "members" (compressed
|
||||
A lzip file consists of one or more independent "members" (compressed
|
||||
data sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them. Each member can
|
||||
encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
|
||||
|
@ -629,10 +638,10 @@ size of a multimember file is unlimited.
|
|||
|
||||
'Member size (8 bytes)'
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, allows the verification of stream integrity,
|
||||
and facilitates the safe recovery of undamaged members from
|
||||
multimember files. Member size should be limited to 2 PiB to prevent
|
||||
the data size field from overflowing.
|
||||
as a distributed index, improves the checking of stream integrity, and
|
||||
facilitates the safe recovery of undamaged members from multimember
|
||||
files. Lzip limits the member size to 2 PiB to prevent the data size
|
||||
field from overflowing.
|
||||
|
||||
|
||||
|
||||
|
@ -648,12 +657,13 @@ member. Such trailing data may be:
|
|||
example when writing to a tape. It is safe to append any amount of
|
||||
padding zero bytes to a lzip file.
|
||||
|
||||
* Useful data added by the user; a cryptographically secure hash, a
|
||||
* Useful data added by the user; an "End Of File" string (to check that
|
||||
the file has not been truncated), a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount of
|
||||
text to a lzip file as long as none of the first four bytes of the text
|
||||
match the corresponding byte in the string "LZIP", and the text does
|
||||
not contain any zero bytes (null characters). Nonzero bytes and zero
|
||||
bytes can't be safely mixed in trailing data.
|
||||
text to a lzip file as long as none of the first four bytes of the
|
||||
text matches the corresponding byte in the string "LZIP", and the text
|
||||
does not contain any zero bytes (null characters). Nonzero bytes and
|
||||
zero bytes can't be safely mixed in trailing data.
|
||||
|
||||
* Garbage added by some not totally successful copy operation.
|
||||
|
||||
|
@ -669,7 +679,7 @@ member. Such trailing data may be:
|
|||
discriminate trailing data from a corrupt header has a Hamming
|
||||
distance (HD) of 3, and the 3 bit flips must happen in different magic
|
||||
bytes for the test to fail. In any case, the option '--trailing-error'
|
||||
guarantees that any corrupt header will be detected.
|
||||
guarantees that any corrupt header is detected.
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
reading lzip files are expected to behave as correctly and usefully as
|
||||
|
@ -689,12 +699,12 @@ File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: T
|
|||
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
|
||||
compressed file (bugs in the system libraries, memory errors, etc).
|
||||
Therefore, if the data you are going to compress are important, give the
|
||||
option '--keep' to plzip and don't remove the original file until you
|
||||
verify the compressed file with a command like
|
||||
'plzip -cd file.lz | cmp file -'. Most RAM errors happening during
|
||||
compression can only be detected by comparing the compressed file with the
|
||||
original because the corruption happens before plzip compresses the RAM
|
||||
contents, resulting in a valid compressed file containing wrong data.
|
||||
option '--keep' to plzip and don't remove the original file until you check
|
||||
the compressed file with a command like 'plzip -cd file.lz | cmp file -'.
|
||||
Most RAM errors happening during compression can only be detected by
|
||||
comparing the compressed file with the original because the corruption
|
||||
happens before plzip compresses the RAM contents, resulting in a valid
|
||||
compressed file containing wrong data.
|
||||
|
||||
|
||||
Example 1: Extract all the files from archive 'foo.tar.lz'.
|
||||
|
@ -722,7 +732,7 @@ the operation is successful, 'file.lz' is removed.
|
|||
plzip -d file.lz
|
||||
|
||||
|
||||
Example 5: Verify the integrity of the compressed file 'file.lz' and show
|
||||
Example 5: Check the integrity of the compressed file 'file.lz' and show
|
||||
status.
|
||||
|
||||
plzip -tv file.lz
|
||||
|
@ -800,20 +810,20 @@ Concept index
|
|||
Tag Table:
|
||||
Node: Top217
|
||||
Node: Introduction1156
|
||||
Node: Output5829
|
||||
Node: Invoking plzip7392
|
||||
Ref: --trailing-error8187
|
||||
Ref: --data-size8425
|
||||
Node: Program design18819
|
||||
Node: Memory requirements21122
|
||||
Node: Minimum file sizes22807
|
||||
Node: File format24821
|
||||
Ref: coded-dict-size26260
|
||||
Node: Trailing data27514
|
||||
Node: Examples29775
|
||||
Ref: concat-example31210
|
||||
Node: Problems31967
|
||||
Node: Concept index32522
|
||||
Node: Output5934
|
||||
Node: Invoking plzip7497
|
||||
Ref: --trailing-error8372
|
||||
Ref: --data-size8610
|
||||
Node: Program design19519
|
||||
Node: Memory requirements21818
|
||||
Node: Minimum file sizes23503
|
||||
Node: File format25506
|
||||
Ref: coded-dict-size26945
|
||||
Node: Trailing data28195
|
||||
Node: Examples30531
|
||||
Ref: concat-example31964
|
||||
Node: Problems32721
|
||||
Node: Concept index33276
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue