Merging upstream version 1.6~pre2.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
33502bf60d
commit
26fbdeadfd
15 changed files with 364 additions and 296 deletions
102
doc/clzip.info
102
doc/clzip.info
|
@ -11,7 +11,7 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Clzip Manual
|
||||
************
|
||||
|
||||
This manual is for Clzip (version 1.6-pre1, 30 January 2014).
|
||||
This manual is for Clzip (version 1.6-pre2, 6 May 2014).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -39,20 +39,31 @@ Clzip is a lossless data compressor with a user interface similar to the
|
|||
one of gzip or bzip2. Clzip decompresses almost as fast as gzip,
|
||||
compresses most files more than bzip2, and is better than both from a
|
||||
data recovery perspective. Clzip is a clean implementation of the LZMA
|
||||
algorithm.
|
||||
(Lempel-Ziv-Markov chain-Algorithm) algorithm.
|
||||
|
||||
Clzip uses the lzip file format; the files produced by clzip are
|
||||
fully compatible with lzip-1.4 or newer, and can be rescued with
|
||||
lziprecover. Clzip is in fact a C language version of lzip, intended
|
||||
for embedded devices or systems lacking a C++ compiler.
|
||||
|
||||
The lzip file format is designed for long-term data archiving and
|
||||
provides very safe integrity checking. It is as simple as possible (but
|
||||
not simpler), so that with the only help of the lzip manual it would be
|
||||
possible for a digital archaeologist to extract the data from a lzip
|
||||
file long after quantum computers eventually render LZMA obsolete.
|
||||
Additionally lzip is copylefted, which guarantees that it will remain
|
||||
free forever.
|
||||
The lzip file format is designed for long-term data archiving, taking
|
||||
into account both data integrity and decoder availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file.
|
||||
|
||||
* The lzip format is as simple as possible (but not simpler). The
|
||||
lzip manual provides the code of a simple decompressor along with
|
||||
a detailed explanation of how it works, so that with the only help
|
||||
of the lzip manual it would be possible for a digital
|
||||
archaeologist to extract the data from a lzip file long after
|
||||
quantum computers eventually render LZMA obsolete.
|
||||
|
||||
* Additionally lzip is copylefted, which guarantees that it will
|
||||
remain free forever.
|
||||
|
||||
The member trailer stores the 32-bit CRC of the original data, the
|
||||
size of the original data and the size of the member. These values,
|
||||
|
@ -66,16 +77,21 @@ though, that the check occurs upon decompression, so it can only tell
|
|||
you that something is wrong. It can't help you recover the original
|
||||
uncompressed data.
|
||||
|
||||
If you ever need to recover data from a damaged lzip file, try the
|
||||
lziprecover program. Lziprecover makes lzip files resistant to bit-flip
|
||||
(one of the most common forms of data corruption), and provides data
|
||||
recovery capabilities, including error-checked merging of damaged copies
|
||||
of a file.
|
||||
|
||||
Clzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for tar or zutils.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times
|
||||
the dictionary size limit (1 if input file size is less than dictionary
|
||||
size limit, else 2) plus 9 times the dictionary size really used. The
|
||||
amount of memory required for decompression is about 46 kB larger than
|
||||
the dictionary size really used.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size
|
||||
for each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
|
||||
When compressing, clzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
When decompressing, clzip attempts to guess the name for the
|
||||
|
@ -114,30 +130,29 @@ multivolume compressed tar archives.
|
|||
automatically creating multi-member output. The members so created are
|
||||
large, about 64 PiB each.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times
|
||||
the dictionary size limit (1 if input file size is less than dictionary
|
||||
size limit, else 2) plus 9 times the dictionary size really used. The
|
||||
amount of memory required for decompression is about 46 kB larger than
|
||||
the dictionary size really used.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size
|
||||
without exceeding the given limit. Keep in mind that the decompression
|
||||
memory requirement is affected at compression time by the choice of
|
||||
dictionary size limit.
|
||||
|
||||
|
||||
File: clzip.info, Node: Algorithm, Next: Invoking clzip, Prev: Introduction, Up: Top
|
||||
|
||||
2 Algorithm
|
||||
***********
|
||||
|
||||
Clzip implements a simplified version of the LZMA (Lempel-Ziv-Markov
|
||||
chain-Algorithm) algorithm. The high compression of LZMA comes from
|
||||
combining two basic, well-proven compression ideas: sliding dictionaries
|
||||
(LZ77/78) and markov models (the thing used by every compression
|
||||
algorithm that uses a range encoder or similar order-0 entropy coder as
|
||||
its last stage) with segregation of contexts according to what the bits
|
||||
are used for.
|
||||
There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
|
||||
coding scheme". For example, the option '-0' of lzip uses the scheme in
|
||||
almost the simplest way possible; issuing the longest match it can find,
|
||||
or a literal byte if it can't find a match. Inversely, a much more
|
||||
elaborated way of finding coding sequences of minimum price than the one
|
||||
currently used by lzip could be developed, and the resulting sequence
|
||||
could also be coded using the LZMA coding scheme.
|
||||
|
||||
Lzip currently implements two variants of the LZMA algorithm; fast
|
||||
(used by option -0) and normal (used by all other compression levels).
|
||||
Clzip just implements the "normal" variant.
|
||||
|
||||
The high compression of LZMA comes from combining two basic,
|
||||
well-proven compression ideas: sliding dictionaries (LZ77/78) and
|
||||
markov models (the thing used by every compression algorithm that uses
|
||||
a range encoder or similar order-0 entropy coder as its last stage)
|
||||
with segregation of contexts according to what the bits are used for.
|
||||
|
||||
Clzip is a two stage compressor. The first stage is a Lempel-Ziv
|
||||
coder, which reduces redundancy by translating chunks of data to their
|
||||
|
@ -145,11 +160,6 @@ corresponding distance-length pairs. The second stage is a range encoder
|
|||
that uses a different probability model for each type of data;
|
||||
distances, lengths, literal bytes, etc.
|
||||
|
||||
The match finder, part of the LZ coder, is the most important piece
|
||||
of the LZMA algorithm, as it is in many Lempel-Ziv based algorithms.
|
||||
Most of clzip's execution time is spent in the match finder, and it has
|
||||
the greatest influence on the compression ratio.
|
||||
|
||||
Here is how it works, step by step:
|
||||
|
||||
1) The member header is written to the output stream.
|
||||
|
@ -261,7 +271,7 @@ The format for running clzip is:
|
|||
'--dictionary-size=BYTES'
|
||||
Set the dictionary size limit in bytes. Valid values range from 4
|
||||
KiB to 512 MiB. Clzip will use the smallest possible dictionary
|
||||
size for each member without exceeding this limit. Note that
|
||||
size for each file without exceeding this limit. Note that
|
||||
dictionary sizes are quantized. If the specified size does not
|
||||
match one of the valid sizes, it will be rounded upwards by adding
|
||||
up to (BYTES / 16) to it.
|
||||
|
@ -530,13 +540,13 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top210
|
||||
Node: Introduction921
|
||||
Node: Algorithm5557
|
||||
Node: Invoking clzip8057
|
||||
Node: File format13656
|
||||
Node: Examples16161
|
||||
Node: Problems18130
|
||||
Node: Concept index18656
|
||||
Node: Introduction916
|
||||
Node: Algorithm5823
|
||||
Node: Invoking clzip8629
|
||||
Node: File format14226
|
||||
Node: Examples16731
|
||||
Node: Problems18700
|
||||
Node: Concept index19226
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue