Merging upstream version 1.7.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-24 04:15:24 +01:00 · 2025-02-24 04:15:24 +01:00 · f6869e4fd3
commit f6869e4fd3
parent 8bc0325467
20 changed files with 841 additions and 444 deletions
--- a/doc/plzip.info
+++ b/doc/plzip.info
@ -11,11 +11,12 @@ File: plzip.info,  Node: Top,  Next: Introduction,  Up: (dir)
 Plzip Manual
 ************

-This manual is for Plzip (version 1.6, 12 April 2017).
+This manual is for Plzip (version 1.7, 7 February 2018).

 * Menu:

 * Introduction::           Purpose and features of plzip
+* Output::                 Meaning of plzip's output
 * Invoking plzip::         Command line interface
 * Program design::         Internal structure of plzip
 * File format::            Detailed format of the compressed file
@ -27,13 +28,13 @@ This manual is for Plzip (version 1.6, 12 April 2017).
 * Concept index::          Index of concepts


-   Copyright (C) 2009-2017 Antonio Diaz Diaz.
+   Copyright (C) 2009-2018 Antonio Diaz Diaz.

   This manual is free documentation: you have unlimited permission to
 copy, distribute and modify it.


-File: plzip.info,  Node: Introduction,  Next: Invoking plzip,  Prev: Top,  Up: Top
+File: plzip.info,  Node: Introduction,  Next: Output,  Prev: Top,  Up: Top

 1 Introduction
 **************
@ -58,7 +59,7 @@ archiving, taking into account both data integrity and decoder
 availability:

   * The lzip format provides very safe integrity checking and some data
-     recovery means. The lziprecover program can repair bit-flip errors
+     recovery means. The lziprecover program can repair bit flip errors
     (one of the most common forms of data corruption) in lzip files,
     and provides data recovery capabilities, including error-checked
     merging of damaged copies of a file.  *Note Data safety:
@ -114,17 +115,60 @@ entirely incomprehensible and therefore pointless.

   Plzip will correctly decompress a file which is the concatenation of
 two or more compressed files. The result is the concatenation of the
-corresponding uncompressed files. Integrity testing of concatenated
+corresponding decompressed files. Integrity testing of concatenated
 compressed files is also supported.

+
+File: plzip.info,  Node: Output,  Next: Invoking plzip,  Prev: Introduction,  Up: Top
+
+2 Meaning of plzip's output
+***************************
+
+The output of plzip looks like this:
+
+     plzip -v foo
+       foo:  6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
+
+     plzip -tvv foo.lz
+       foo.lz:  6.676:1, 14.98% ratio, 85.02% saved.  ok
+
+   The meaning of each field is as follows:
+
+'N:1'
+     The compression ratio (uncompressed_size / compressed_size), shown
+     as N to 1.
+
+'ratio'
+     The inverse compression ratio
+     (compressed_size / uncompressed_size), shown as a percentage. A
+     decimal ratio is easily obtained by moving the decimal point two
+     places to the left; 14.98% = 0.1498.
+
+'saved'
+     The space saved by compression (1 - ratio), shown as a percentage.
+
+'in'
+     The size of the uncompressed data. When decompressing or testing,
+     it is shown as 'decompressed'. Note that plzip always prints the
+     uncompressed size before the compressed size when compressing,
+     decompressing, testing or listing.
+
+'out'
+     The size of the compressed data. When decompressing or testing, it
+     is shown as 'compressed'.
+
+
+   When decompressing or testing at verbosity level 4 (-vvvv), the
+dictionary size used to compress the file is also shown.
+
   LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may
 never have been compressed. Decompressed is used to refer to data which
 have undergone the process of decompression.


-File: plzip.info,  Node: Invoking plzip,  Next: Program design,  Prev: Introduction,  Up: Top
+File: plzip.info,  Node: Invoking plzip,  Next: Program design,  Prev: Output,  Up: Top

-2 Invoking plzip
+3 Invoking plzip
 ****************

 The format for running plzip is:
@ -135,7 +179,7 @@ The format for running plzip is:
 other FILES and is read just once, the first time it appears in the
 command line.

-   Plzip supports the following options:
+   plzip supports the following options:

 '-h'
 '--help'
@ -154,12 +198,12 @@ command line.

 '-B BYTES'
 '--data-size=BYTES'
-     Set the size of the input data blocks, in bytes. The input file
-     will be divided in chunks of this size before compression is
-     performed. Valid values range from 8 KiB to 1 GiB. Default value
-     is two times the dictionary size, except for option '-0' where it
-     defaults to 1 MiB.  Plzip will reduce the dictionary size if it is
-     larger than the chosen data size.
+     When compressing, set the size of the input data blocks in bytes.
+     The input file will be divided in chunks of this size before
+     compression is performed. Valid values range from 8 KiB to 1 GiB.
+     Default value is two times the dictionary size, except for option
+     '-0' where it defaults to 1 MiB. Plzip will reduce the dictionary
+     size if it is larger than the chosen data size.

 '-c'
 '--stdout'
@ -170,10 +214,10 @@ command line.

 '-d'
 '--decompress'
-     Decompress the specified file(s). If a file does not exist or
-     can't be opened, plzip continues decompressing the rest of the
-     files. If a file fails to decompress, plzip exits immediately
-     without decompressing the rest of the files.
+     Decompress the specified files. If a file does not exist or can't
+     be opened, plzip continues decompressing the rest of the files. If
+     a file fails to decompress, or is a terminal, plzip exits
+     immediately without decompressing the rest of the files.

 '-f'
 '--force'
@ -181,8 +225,8 @@ command line.

 '-F'
 '--recompress'
-     Force re-compression of files whose name already has the '.lz' or
-     '.tlz' suffix.
+     When compressing, force re-compression of files whose name already
+     has the '.lz' or '.tlz' suffix.

 '-k'
 '--keep'
@ -192,7 +236,7 @@ command line.
 '-l'
 '--list'
     Print the uncompressed size, compressed size and percentage saved
-     of the specified file(s). Trailing data are ignored. The values
+     of the specified files. Trailing data are ignored. The values
     produced are correct even for multimember files. If more than one
     file is given, a final line containing the cumulative sizes is
     printed. With '-v', the dictionary size, the number of members in
@ -206,18 +250,21 @@ command line.

 '-m BYTES'
 '--match-length=BYTES'
-     Set the match length limit in bytes. After a match this long is
-     found, the search is finished. Valid values range from 5 to 273.
-     Larger values usually give better compression ratios but longer
-     compression times.
+     When compressing, set the match length limit in bytes. After a
+     match this long is found, the search is finished. Valid values
+     range from 5 to 273. Larger values usually give better compression
+     ratios but longer compression times.

 '-n N'
 '--threads=N'
-     Set the number of worker threads. Valid values range from 1 to "as
-     many as your system can support". If this option is not used,
-     plzip tries to detect the number of processors in the system and
-     use it as default value. 'plzip --help' shows the system's default
-     value.
+     Set the number of worker threads, overriding the system's default.
+     Valid values range from 1 to "as many as your system can support".
+     If this option is not used, plzip tries to detect the number of
+     processors in the system and use it as default value. When
+     compressing on a 32 bit system, plzip tries to limit the memory
+     use to under 2.22 GiB (4 worker threads at level -9) by reducing
+     the number of threads below the system's default. 'plzip --help'
+     shows the system's default value.

     Note that the number of usable threads is limited to
     ceil( file_size / data_size ) during compression (*note Minimum
@ -228,8 +275,9 @@ command line.
 '--output=FILE'
     When reading from standard input and '--stdout' has not been
     specified, use 'FILE' as the virtual name of the uncompressed
-     file. This produces a file named 'FILE' when decompressing, and a
-     file named 'FILE.lz' when compressing.
+     file. This produces a file named 'FILE' when decompressing, or a
+     file named 'FILE.lz' when compressing. A second '.lz' extension is
+     not added if 'FILE' already ends in '.lz' or '.tlz'.

 '-q'
 '--quiet'
@ -237,13 +285,13 @@ command line.

 '-s BYTES'
 '--dictionary-size=BYTES'
-     Set the dictionary size limit in bytes. Plzip will use the smallest
-     possible dictionary size for each file without exceeding this
-     limit.  Valid values range from 4 KiB to 512 MiB. Values 12 to 29
-     are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
-     that dictionary sizes are quantized. If the specified size does
-     not match one of the valid sizes, it will be rounded upwards by
-     adding up to (BYTES / 8) to it.
+     When compressing, set the dictionary size limit in bytes. Plzip
+     will use the smallest possible dictionary size for each file
+     without exceeding this limit. Valid values range from 4 KiB to
+     512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
+     2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
+     the specified size does not match one of the valid sizes, it will
+     be rounded upwards by adding up to (BYTES / 8) to it.

     For maximum compression you should use a dictionary size limit as
     large as possible, but keep in mind that the decompression memory
@ -252,10 +300,10 @@ command line.

 '-t'
 '--test'
-     Check integrity of the specified file(s), but don't decompress
-     them.  This really performs a trial decompression and throws away
-     the result.  Use it together with '-v' to see information about
-     the file(s). If a file does not exist, can't be opened, or is a
+     Check integrity of the specified files, but don't decompress them.
+     This really performs a trial decompression and throws away the
+     result. Use it together with '-v' to see information about the
+     files. If a file does not exist, can't be opened, or is a
     terminal, plzip continues checking the rest of the files. If a
     file fails the test, plzip may be unable to check the rest of the
     files.
@ -263,17 +311,19 @@ command line.
 '-v'
 '--verbose'
     Verbose mode.
-     When compressing, show the compression ratio for each file
-     processed. A second '-v' shows the progress of compression.
+     When compressing, show the compression ratio and size for each file
+     processed.
     When decompressing or testing, further -v's (up to 4) increase the
     verbosity level, showing status, compression ratio, dictionary
     size, decompressed size, and compressed size.
+     Two or more '-v' options show the progress of (de)compression,
+     except for single-member files.

 '-0 .. -9'
     Set the compression parameters (dictionary size and match length
     limit) as shown in the table below. The default compression level
     is '-6'.  Note that '-9' can be much slower than '-0'. These
-     options have no effect when decompressing.
+     options have no effect when decompressing, testing or listing.

     The bidimensional parameter space of LZMA can't be mapped to a
     linear scale optimal for all files. If your files are large, very
@ -296,6 +346,13 @@ command line.
 '--best'
     Aliases for GNU gzip compatibility.

+'--loose-trailing'
+     When decompressing, testing or listing, allow trailing data whose
+     first bytes are so similar to the magic bytes of a lzip header
+     that they can be confused with a corrupt header. Use this option
+     if a file triggers a "corrupt header" error and the cause is not
+     indeed a corrupt header.
+

   Numbers given as arguments to options may be followed by a multiplier
 and an optional 'B' for "byte".
@ -321,7 +378,7 @@ caused plzip to panic.

 File: plzip.info,  Node: Program design,  Next: File format,  Prev: Invoking plzip,  Up: Top

-3 Program design
+4 Program design
 ****************

 When compressing, plzip divides the input file into chunks and
@ -344,6 +401,17 @@ them to the workers. The workers (de)compress the blocks received from
 the splitter. The muxer collects processed packets from the workers, and
 writes them to the output file.

+                             ,------------,
+                         ,-->| worker   0 |--,
+                         |   `------------'  |
+,-------,   ,----------, |   ,------------,  |   ,-------,   ,--------,
+| input |-->| splitter |-+-->| worker   1 |--+-->| muxer |-->| output |
+| file  |   `----------' |   `------------'  |   `-------'   |  file  |
+`-------'                |        ...        |               `--------'
+                         |   ,------------,  |
+                         `-->| worker N-1 |--'
+                             `------------'
+
   When decompressing from a regular file, the splitter is removed and
 the workers read directly from the input file. If the output file is
 also a regular file, the muxer is also removed and the workers write
@ -355,7 +423,7 @@ I/O speed.

 File: plzip.info,  Node: File format,  Next: Memory requirements,  Prev: Program design,  Up: Top

-4 File format
+5 File format
 *************

 Perfection is reached, not when there is no longer anything to add, but
@ -426,17 +494,11 @@ additional information before, between, or after them.

 File: plzip.info,  Node: Memory requirements,  Next: Minimum file sizes,  Prev: File format,  Up: Top

-5 Memory required to compress and decompress
+6 Memory required to compress and decompress
 ********************************************

-The amount of memory required *per thread* is approximately the
-following:
-
-   * For compression at level -0; 1.5 MiB plus 3 times the data size
-     (*note --data-size::). Default is 4.5 MiB.
-
-   * For compression at other levels; 11 times the dictionary size plus
-     3 times the data size. Default is 136 MiB.
+The amount of memory required *per thread* for decompression or testing
+is approximately the following:

   * For decompression of a regular (seekable) file to another regular
     file, or for testing of a regular file; the dictionary size.
@ -450,10 +512,35 @@ following:
   * For decompression of a non-seekable file or of standard input; the
     dictionary size plus up to 35 MiB.

+The amount of memory required *per thread* for compression is
+approximately the following:
+
+   * For compression at level -0; 1.5 MiB plus 3.375 times the data size
+     (*note --data-size::). Default is 4.875 MiB.
+
+   * For compression at other levels; 11 times the dictionary size plus
+     3.375 times the data size. Default is 142 MiB.
+
+The following table shows the memory required *per thread* for
+compression at a given level, using the default data size for each
+level:
+
+Level   Memory required
+-0      4.875 MiB
+-1      17.75 MiB
+-2      26.625 MiB
+-3      35.5 MiB
+-4      53.25 MiB
+-5      71 MiB
+-6      142 MiB
+-7      284 MiB
+-8      426 MiB
+-9      568 MiB
+

 File: plzip.info,  Node: Minimum file sizes,  Next: Trailing data,  Prev: Memory requirements,  Up: Top

-6 Minimum file sizes required for full compression speed
+7 Minimum file sizes required for full compression speed
 ********************************************************

 When compressing, plzip divides the input file into chunks and
@ -466,7 +553,8 @@ must be at least as large as the number of worker threads times the
 chunk size (*note --data-size::). Else some processors will not get any
 data to compress, and compression will be proportionally slower. The
 maximum speed increase achievable on a given file is limited by the
-ratio (file_size / data_size).
+ratio (file_size / data_size). For example, a tarball the size of gcc or
+linux will scale up to 8 processors at level -9.

   The following table shows the minimum uncompressed file size needed
 for full use of N processors at a given compression level, using the
@ -489,7 +577,7 @@ Level

 File: plzip.info,  Node: Trailing data,  Next: Examples,  Prev: Minimum file sizes,  Up: Top

-7 Extra data appended to the file
+8 Extra data appended to the file
 *********************************

 Sometimes extra data are found appended to a lzip file after the last
@ -501,10 +589,11 @@ member. Such trailing data may be:

   * Useful data added by the user; a cryptographically secure hash, a
     description of file contents, etc. It is safe to append any amount
-     of text to a lzip file as long as the text does not begin with the
-     string "LZIP", and does not contain any zero bytes (null
-     characters). Nonzero bytes and zero bytes can't be safely mixed in
-     trailing data.
+     of text to a lzip file as long as none of the first four bytes of
+     the text match the corresponding byte in the string "LZIP", and
+     the text does not contain any zero bytes (null characters).
+     Nonzero bytes and zero bytes can't be safely mixed in trailing
+     data.

   * Garbage added by some not totally successful copy operation.

@ -512,12 +601,17 @@ member. Such trailing data may be:
     and hash value (for a chosen hash) coincide with those of another
     file.

-   * In very rare cases, trailing data could be the corrupt header of
-     another member. In multimember or concatenated files the
-     probability of corruption happening in the magic bytes is 5 times
-     smaller than the probability of getting a false positive caused by
-     the corruption of the integrity information itself. Therefore it
-     can be considered to be below the noise level.
+   * In rare cases, trailing data could be the corrupt header of another
+     member. In multimember or concatenated files the probability of
+     corruption happening in the magic bytes is 5 times smaller than the
+     probability of getting a false positive caused by the corruption
+     of the integrity information itself. Therefore it can be
+     considered to be below the noise level. Additionally, the test
+     used by plzip to discriminate trailing data from a corrupt header
+     has a Hamming distance (HD) of 3, and the 3 bit flips must happen
+     in different magic bytes for the test to fail. In any case, the
+     option '--trailing-error' guarantees that any corrupt header will
+     be detected.

   Trailing data are in no way part of the lzip file format, but tools
 reading lzip files are expected to behave as correctly and usefully as
@ -531,7 +625,7 @@ cases where a file containing trailing data must be rejected, the option

 File: plzip.info,  Node: Examples,  Next: Problems,  Prev: Trailing data,  Up: Top

-8 A small tutorial with examples
+9 A small tutorial with examples
 ********************************

 WARNING! Even if plzip is bug-free, other causes may result in a corrupt
@ -595,8 +689,8 @@ to decompressed byte 15000 (5000 bytes are produced).

 File: plzip.info,  Node: Problems,  Next: Concept index,  Prev: Examples,  Up: Top

-9 Reporting bugs
-****************
+10 Reporting bugs
+*****************

 There are probably bugs in plzip. There are certainly errors and
 omissions in this manual. If you report them, they will get fixed. If
@ -625,6 +719,7 @@ Concept index
 * memory requirements:                   Memory requirements.   (line 6)
 * minimum file sizes:                    Minimum file sizes.    (line 6)
 * options:                               Invoking plzip.        (line 6)
+* output:                                Output.                (line 6)
 * program design:                        Program design.        (line 6)
 * trailing data:                         Trailing data.         (line 6)
 * usage:                                 Invoking plzip.        (line 6)
@ -634,19 +729,20 @@ Concept index

 Tag Table:
 Node: Top221
-Node: Introduction1103
-Node: Invoking plzip5274
-Ref: --trailing-error5843
-Ref: --data-size6086
-Node: Program design12796
-Node: File format14383
-Node: Memory requirements16815
-Node: Minimum file sizes17815
-Node: Trailing data19741
-Node: Examples21648
-Ref: concat-example22813
-Node: Problems23388
-Node: Concept index23914
+Node: Introduction1158
+Node: Output5134
+Node: Invoking plzip6614
+Ref: --trailing-error7177
+Ref: --data-size7420
+Node: Program design14938
+Node: File format17090
+Node: Memory requirements19522
+Node: Minimum file sizes20985
+Node: Trailing data23002
+Node: Examples25285
+Ref: concat-example26450
+Node: Problems27025
+Node: Concept index27553

 End Tag Table