Merging upstream version 1.7.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
8b4a400260
commit
e789a1190c
10 changed files with 208 additions and 205 deletions
162
doc/clzip.texi
162
doc/clzip.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 23 May 2015
|
||||
@set VERSION 1.7-rc1
|
||||
@set UPDATED 7 July 2015
|
||||
@set VERSION 1.7
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -36,9 +36,9 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
|
|||
|
||||
@menu
|
||||
* Introduction:: Purpose and features of clzip
|
||||
* Algorithm:: How clzip compresses the data
|
||||
* Invoking clzip:: Command line interface
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Algorithm:: How clzip compresses the data
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -72,10 +72,14 @@ availability:
|
|||
@itemize @bullet
|
||||
@item
|
||||
The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors (one
|
||||
of the most common forms of data corruption) in lzip files, and provides
|
||||
data recovery capabilities, including error-checked merging of damaged
|
||||
copies of a file.
|
||||
recovery means. The
|
||||
@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
|
||||
program can repair bit-flip errors (one of the most common forms of data
|
||||
corruption) in lzip files, and provides data recovery capabilities,
|
||||
including error-checked merging of damaged copies of a file.
|
||||
@ifnothtml
|
||||
@ref{Data safety,,,lziprecover}.
|
||||
@end ifnothtml
|
||||
|
||||
@item
|
||||
The lzip format is as simple as possible (but not simpler). The lzip
|
||||
|
@ -111,6 +115,11 @@ bzip2, which makes it safer than compressors returning ambiguous warning
|
|||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times the
|
||||
dictionary size limit (1 if input file size is less than dictionary size
|
||||
limit, else 2) plus 9 times the dictionary size really used. The option
|
||||
|
@ -118,11 +127,6 @@ limit, else 2) plus 9 times the dictionary size really used. The option
|
|||
of memory required for decompression is about 46 kB larger than the
|
||||
dictionary size really used.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
|
||||
When compressing, clzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
When decompressing, clzip attempts to guess the name for the decompressed
|
||||
|
@ -164,72 +168,6 @@ automatically creating multi-member output. The members so created are
|
|||
large, about 2 PiB each.
|
||||
|
||||
|
||||
@node Algorithm
|
||||
@chapter Algorithm
|
||||
@cindex algorithm
|
||||
|
||||
In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
|
||||
concrete algorithm; it is more like "any algorithm using the LZMA coding
|
||||
scheme". For example, the option '-0' of lzip uses the scheme in almost
|
||||
the simplest way possible; issuing the longest match it can find, or a
|
||||
literal byte if it can't find a match. Inversely, a much more elaborated
|
||||
way of finding coding sequences of minimum size than the one currently
|
||||
used by lzip could be developed, and the resulting sequence could also
|
||||
be coded using the LZMA coding scheme.
|
||||
|
||||
Clzip currently implements two variants of the LZMA algorithm; fast
|
||||
(used by option -0) and normal (used by all other compression levels).
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
|
||||
thing used by every compression algorithm that uses a range encoder or
|
||||
similar order-0 entropy coder as its last stage) with segregation of
|
||||
contexts according to what the bits are used for.
|
||||
|
||||
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
|
||||
which reduces redundancy by translating chunks of data to their
|
||||
corresponding distance-length pairs. The second stage is a range encoder
|
||||
that uses a different probability model for each type of data;
|
||||
distances, lengths, literal bytes, etc.
|
||||
|
||||
Here is how it works, step by step:
|
||||
|
||||
1) The member header is written to the output stream.
|
||||
|
||||
2) The first byte is coded literally, because there are no previous
|
||||
bytes to which the match finder can refer to.
|
||||
|
||||
3) The main encoder advances to the next byte in the input data and
|
||||
calls the match finder.
|
||||
|
||||
4) The match finder fills an array with the minimum distances before the
|
||||
current byte where a match of a given length can be found.
|
||||
|
||||
5) Go back to step 3 until a sequence (formed of pairs, repeated
|
||||
distances and literal bytes) of minimum price has been formed. Where the
|
||||
price represents the number of output bits produced.
|
||||
|
||||
6) The range encoder encodes the sequence produced by the main encoder
|
||||
and sends the produced bytes to the output stream.
|
||||
|
||||
7) Go back to step 3 until the input data are finished or until the
|
||||
member or volume size limits are reached.
|
||||
|
||||
8) The range encoder is flushed.
|
||||
|
||||
9) The member trailer is written to the output stream.
|
||||
|
||||
10) If there are more data to compress, go back to step 1.
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
The ideas embodied in clzip are due to (at least) the following people:
|
||||
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
|
||||
the definition of Markov chains), G.N.N. Martin (for the definition of
|
||||
range encoding), Igor Pavlov (for putting all the above together in
|
||||
LZMA), and Julian Seward (for bzip2's CLI).
|
||||
|
||||
|
||||
@node Invoking clzip
|
||||
@chapter Invoking clzip
|
||||
@cindex invoking
|
||||
|
@ -276,7 +214,7 @@ Force overwrite of output files.
|
|||
|
||||
@item -F
|
||||
@itemx --recompress
|
||||
Force recompression of files whose name already has the @samp{.lz} or
|
||||
Force re-compression of files whose name already has the @samp{.lz} or
|
||||
@samp{.tlz} suffix.
|
||||
|
||||
@item -k
|
||||
|
@ -476,6 +414,72 @@ facilitates safe recovery of undamaged members from multi-member files.
|
|||
@end table
|
||||
|
||||
|
||||
@node Algorithm
|
||||
@chapter Algorithm
|
||||
@cindex algorithm
|
||||
|
||||
In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
|
||||
concrete algorithm; it is more like "any algorithm using the LZMA coding
|
||||
scheme". For example, the option @samp{-0} of lzip uses the scheme in almost
|
||||
the simplest way possible; issuing the longest match it can find, or a
|
||||
literal byte if it can't find a match. Inversely, a much more elaborated
|
||||
way of finding coding sequences of minimum size than the one currently
|
||||
used by lzip could be developed, and the resulting sequence could also
|
||||
be coded using the LZMA coding scheme.
|
||||
|
||||
Clzip currently implements two variants of the LZMA algorithm; fast
|
||||
(used by option @samp{-0}) and normal (used by all other compression levels).
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
|
||||
thing used by every compression algorithm that uses a range encoder or
|
||||
similar order-0 entropy coder as its last stage) with segregation of
|
||||
contexts according to what the bits are used for.
|
||||
|
||||
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
|
||||
which reduces redundancy by translating chunks of data to their
|
||||
corresponding distance-length pairs. The second stage is a range encoder
|
||||
that uses a different probability model for each type of data;
|
||||
distances, lengths, literal bytes, etc.
|
||||
|
||||
Here is how it works, step by step:
|
||||
|
||||
1) The member header is written to the output stream.
|
||||
|
||||
2) The first byte is coded literally, because there are no previous
|
||||
bytes to which the match finder can refer to.
|
||||
|
||||
3) The main encoder advances to the next byte in the input data and
|
||||
calls the match finder.
|
||||
|
||||
4) The match finder fills an array with the minimum distances before the
|
||||
current byte where a match of a given length can be found.
|
||||
|
||||
5) Go back to step 3 until a sequence (formed of pairs, repeated
|
||||
distances and literal bytes) of minimum price has been formed. Where the
|
||||
price represents the number of output bits produced.
|
||||
|
||||
6) The range encoder encodes the sequence produced by the main encoder
|
||||
and sends the produced bytes to the output stream.
|
||||
|
||||
7) Go back to step 3 until the input data are finished or until the
|
||||
member or volume size limits are reached.
|
||||
|
||||
8) The range encoder is flushed.
|
||||
|
||||
9) The member trailer is written to the output stream.
|
||||
|
||||
10) If there are more data to compress, go back to step 1.
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
The ideas embodied in clzip are due to (at least) the following people:
|
||||
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
|
||||
the definition of Markov chains), G.N.N. Martin (for the definition of
|
||||
range encoding), Igor Pavlov (for putting all the above together in
|
||||
LZMA), and Julian Seward (for bzip2's CLI).
|
||||
|
||||
|
||||
@node Examples
|
||||
@chapter A small tutorial with examples
|
||||
@cindex examples
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue