1
0
Fork 0

Merging upstream version 1.7.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 20:40:56 +01:00
parent 8b4a400260
commit e789a1190c
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
10 changed files with 208 additions and 205 deletions

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 23 May 2015
@set VERSION 1.7-rc1
@set UPDATED 7 July 2015
@set VERSION 1.7
@dircategory Data Compression
@direntry
@ -36,9 +36,9 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
@menu
* Introduction:: Purpose and features of clzip
* Algorithm:: How clzip compresses the data
* Invoking clzip:: Command line interface
* File format:: Detailed format of the compressed file
* Algorithm:: How clzip compresses the data
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@ -72,10 +72,14 @@ availability:
@itemize @bullet
@item
The lzip format provides very safe integrity checking and some data
recovery means. The lziprecover program can repair bit-flip errors (one
of the most common forms of data corruption) in lzip files, and provides
data recovery capabilities, including error-checked merging of damaged
copies of a file.
recovery means. The
@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
program can repair bit-flip errors (one of the most common forms of data
corruption) in lzip files, and provides data recovery capabilities,
including error-checked merging of damaged copies of a file.
@ifnothtml
@ref{Data safety,,,lziprecover}.
@end ifnothtml
@item
The lzip format is as simple as possible (but not simpler). The lzip
@ -111,6 +115,11 @@ bzip2, which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs like
tar or zutils.
Clzip will automatically use the smallest possible dictionary size for
each file without exceeding the given limit. Keep in mind that the
decompression memory requirement is affected at compression time by the
choice of dictionary size limit.
The amount of memory required for compression is about 1 or 2 times the
dictionary size limit (1 if input file size is less than dictionary size
limit, else 2) plus 9 times the dictionary size really used. The option
@ -118,11 +127,6 @@ limit, else 2) plus 9 times the dictionary size really used. The option
of memory required for decompression is about 46 kB larger than the
dictionary size really used.
Clzip will automatically use the smallest possible dictionary size for
each file without exceeding the given limit. Keep in mind that the
decompression memory requirement is affected at compression time by the
choice of dictionary size limit.
When compressing, clzip replaces every file given in the command line
with a compressed version of itself, with the name "original_name.lz".
When decompressing, clzip attempts to guess the name for the decompressed
@ -164,72 +168,6 @@ automatically creating multi-member output. The members so created are
large, about 2 PiB each.
@node Algorithm
@chapter Algorithm
@cindex algorithm
In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
concrete algorithm; it is more like "any algorithm using the LZMA coding
scheme". For example, the option '-0' of lzip uses the scheme in almost
the simplest way possible; issuing the longest match it can find, or a
literal byte if it can't find a match. Inversely, a much more elaborated
way of finding coding sequences of minimum size than the one currently
used by lzip could be developed, and the resulting sequence could also
be coded using the LZMA coding scheme.
Clzip currently implements two variants of the LZMA algorithm; fast
(used by option -0) and normal (used by all other compression levels).
The high compression of LZMA comes from combining two basic, well-proven
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
thing used by every compression algorithm that uses a range encoder or
similar order-0 entropy coder as its last stage) with segregation of
contexts according to what the bits are used for.
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
which reduces redundancy by translating chunks of data to their
corresponding distance-length pairs. The second stage is a range encoder
that uses a different probability model for each type of data;
distances, lengths, literal bytes, etc.
Here is how it works, step by step:
1) The member header is written to the output stream.
2) The first byte is coded literally, because there are no previous
bytes to which the match finder can refer to.
3) The main encoder advances to the next byte in the input data and
calls the match finder.
4) The match finder fills an array with the minimum distances before the
current byte where a match of a given length can be found.
5) Go back to step 3 until a sequence (formed of pairs, repeated
distances and literal bytes) of minimum price has been formed. Where the
price represents the number of output bits produced.
6) The range encoder encodes the sequence produced by the main encoder
and sends the produced bytes to the output stream.
7) Go back to step 3 until the input data are finished or until the
member or volume size limits are reached.
8) The range encoder is flushed.
9) The member trailer is written to the output stream.
10) If there are more data to compress, go back to step 1.
@sp 1
@noindent
The ideas embodied in clzip are due to (at least) the following people:
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
the definition of Markov chains), G.N.N. Martin (for the definition of
range encoding), Igor Pavlov (for putting all the above together in
LZMA), and Julian Seward (for bzip2's CLI).
@node Invoking clzip
@chapter Invoking clzip
@cindex invoking
@ -276,7 +214,7 @@ Force overwrite of output files.
@item -F
@itemx --recompress
Force recompression of files whose name already has the @samp{.lz} or
Force re-compression of files whose name already has the @samp{.lz} or
@samp{.tlz} suffix.
@item -k
@ -476,6 +414,72 @@ facilitates safe recovery of undamaged members from multi-member files.
@end table
@node Algorithm
@chapter Algorithm
@cindex algorithm
In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
concrete algorithm; it is more like "any algorithm using the LZMA coding
scheme". For example, the option @samp{-0} of lzip uses the scheme in almost
the simplest way possible; issuing the longest match it can find, or a
literal byte if it can't find a match. Inversely, a much more elaborated
way of finding coding sequences of minimum size than the one currently
used by lzip could be developed, and the resulting sequence could also
be coded using the LZMA coding scheme.
Clzip currently implements two variants of the LZMA algorithm; fast
(used by option @samp{-0}) and normal (used by all other compression levels).
The high compression of LZMA comes from combining two basic, well-proven
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
thing used by every compression algorithm that uses a range encoder or
similar order-0 entropy coder as its last stage) with segregation of
contexts according to what the bits are used for.
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
which reduces redundancy by translating chunks of data to their
corresponding distance-length pairs. The second stage is a range encoder
that uses a different probability model for each type of data;
distances, lengths, literal bytes, etc.
Here is how it works, step by step:
1) The member header is written to the output stream.
2) The first byte is coded literally, because there are no previous
bytes to which the match finder can refer to.
3) The main encoder advances to the next byte in the input data and
calls the match finder.
4) The match finder fills an array with the minimum distances before the
current byte where a match of a given length can be found.
5) Go back to step 3 until a sequence (formed of pairs, repeated
distances and literal bytes) of minimum price has been formed. Where the
price represents the number of output bits produced.
6) The range encoder encodes the sequence produced by the main encoder
and sends the produced bytes to the output stream.
7) Go back to step 3 until the input data are finished or until the
member or volume size limits are reached.
8) The range encoder is flushed.
9) The member trailer is written to the output stream.
10) If there are more data to compress, go back to step 1.
@sp 1
@noindent
The ideas embodied in clzip are due to (at least) the following people:
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
the definition of Markov chains), G.N.N. Martin (for the definition of
range encoding), Igor Pavlov (for putting all the above together in
LZMA), and Julian Seward (for bzip2's CLI).
@node Examples
@chapter A small tutorial with examples
@cindex examples