Merging upstream version 1.6~pre2.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
33502bf60d
commit
26fbdeadfd
15 changed files with 364 additions and 296 deletions
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 30 January 2014
|
||||
@set VERSION 1.6-pre1
|
||||
@set UPDATED 6 May 2014
|
||||
@set VERSION 1.6-pre2
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -59,20 +59,36 @@ Clzip is a lossless data compressor with a user interface similar to the
|
|||
one of gzip or bzip2. Clzip decompresses almost as fast as gzip,
|
||||
compresses most files more than bzip2, and is better than both from a
|
||||
data recovery perspective. Clzip is a clean implementation of the LZMA
|
||||
algorithm.
|
||||
(Lempel-Ziv-Markov chain-Algorithm) algorithm.
|
||||
|
||||
Clzip uses the lzip file format; the files produced by clzip are fully
|
||||
compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
|
||||
Clzip is in fact a C language version of lzip, intended for embedded
|
||||
devices or systems lacking a C++ compiler.
|
||||
|
||||
The lzip file format is designed for long-term data archiving and
|
||||
provides very safe integrity checking. It is as simple as possible (but
|
||||
not simpler), so that with the only help of the lzip manual it would be
|
||||
possible for a digital archaeologist to extract the data from a lzip
|
||||
file long after quantum computers eventually render LZMA obsolete.
|
||||
The lzip file format is designed for long-term data archiving, taking
|
||||
into account both data integrity and decoder availability:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors (one
|
||||
of the most common forms of data corruption) in lzip files, and provides
|
||||
data recovery capabilities, including error-checked merging of damaged
|
||||
copies of a file.
|
||||
|
||||
@item
|
||||
The lzip format is as simple as possible (but not simpler). The lzip
|
||||
manual provides the code of a simple decompressor along with a detailed
|
||||
explanation of how it works, so that with the only help of the lzip
|
||||
manual it would be possible for a digital archaeologist to extract the
|
||||
data from a lzip file long after quantum computers eventually render
|
||||
LZMA obsolete.
|
||||
|
||||
@item
|
||||
Additionally lzip is copylefted, which guarantees that it will remain
|
||||
free forever.
|
||||
@end itemize
|
||||
|
||||
The member trailer stores the 32-bit CRC of the original data, the size
|
||||
of the original data and the size of the member. These values, together
|
||||
|
@ -85,16 +101,21 @@ going undetected are microscopic. Be aware, though, that the check
|
|||
occurs upon decompression, so it can only tell you that something is
|
||||
wrong. It can't help you recover the original uncompressed data.
|
||||
|
||||
If you ever need to recover data from a damaged lzip file, try the
|
||||
lziprecover program. Lziprecover makes lzip files resistant to bit-flip
|
||||
(one of the most common forms of data corruption), and provides data
|
||||
recovery capabilities, including error-checked merging of damaged copies
|
||||
of a file.
|
||||
|
||||
Clzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for tar or zutils.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times the
|
||||
dictionary size limit (1 if input file size is less than dictionary size
|
||||
limit, else 2) plus 9 times the dictionary size really used. The amount
|
||||
of memory required for decompression is about 46 kB larger than the
|
||||
dictionary size really used.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
|
||||
When compressing, clzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
When decompressing, clzip attempts to guess the name for the decompressed
|
||||
|
@ -135,29 +156,28 @@ Clzip is able to compress and decompress streams of unlimited size by
|
|||
automatically creating multi-member output. The members so created are
|
||||
large, about 64 PiB each.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times the
|
||||
dictionary size limit (1 if input file size is less than dictionary size
|
||||
limit, else 2) plus 9 times the dictionary size really used. The amount
|
||||
of memory required for decompression is about 46 kB larger than the
|
||||
dictionary size really used.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size
|
||||
without exceeding the given limit. Keep in mind that the decompression
|
||||
memory requirement is affected at compression time by the choice of
|
||||
dictionary size limit.
|
||||
|
||||
|
||||
@node Algorithm
|
||||
@chapter Algorithm
|
||||
@cindex algorithm
|
||||
|
||||
Clzip implements a simplified version of the LZMA (Lempel-Ziv-Markov
|
||||
chain-Algorithm) algorithm. The high compression of LZMA comes from
|
||||
combining two basic, well-proven compression ideas: sliding dictionaries
|
||||
(LZ77/78) and markov models (the thing used by every compression
|
||||
algorithm that uses a range encoder or similar order-0 entropy coder as
|
||||
its last stage) with segregation of contexts according to what the bits
|
||||
are used for.
|
||||
There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
|
||||
coding scheme". For example, the option '-0' of lzip uses the scheme in
|
||||
almost the simplest way possible; issuing the longest match it can find,
|
||||
or a literal byte if it can't find a match. Inversely, a much more
|
||||
elaborated way of finding coding sequences of minimum price than the one
|
||||
currently used by lzip could be developed, and the resulting sequence
|
||||
could also be coded using the LZMA coding scheme.
|
||||
|
||||
Lzip currently implements two variants of the LZMA algorithm; fast (used
|
||||
by option -0) and normal (used by all other compression levels). Clzip
|
||||
just implements the "normal" variant.
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
|
||||
thing used by every compression algorithm that uses a range encoder or
|
||||
similar order-0 entropy coder as its last stage) with segregation of
|
||||
contexts according to what the bits are used for.
|
||||
|
||||
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
|
||||
which reduces redundancy by translating chunks of data to their
|
||||
|
@ -165,11 +185,6 @@ corresponding distance-length pairs. The second stage is a range encoder
|
|||
that uses a different probability model for each type of data;
|
||||
distances, lengths, literal bytes, etc.
|
||||
|
||||
The match finder, part of the LZ coder, is the most important piece of
|
||||
the LZMA algorithm, as it is in many Lempel-Ziv based algorithms. Most
|
||||
of clzip's execution time is spent in the match finder, and it has the
|
||||
greatest influence on the compression ratio.
|
||||
|
||||
Here is how it works, step by step:
|
||||
|
||||
1) The member header is written to the output stream.
|
||||
|
@ -284,7 +299,7 @@ Quiet operation. Suppress all messages.
|
|||
@itemx --dictionary-size=@var{bytes}
|
||||
Set the dictionary size limit in bytes. Valid values range from 4 KiB to
|
||||
512 MiB. Clzip will use the smallest possible dictionary size for each
|
||||
member without exceeding this limit. Note that dictionary sizes are
|
||||
file without exceeding this limit. Note that dictionary sizes are
|
||||
quantized. If the specified size does not match one of the valid sizes,
|
||||
it will be rounded upwards by adding up to (@var{bytes} / 16) to it.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue