Merging upstream version 1.6~pre2.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
33502bf60d
commit
26fbdeadfd
15 changed files with 364 additions and 296 deletions
70
README
70
README
|
@ -11,35 +11,34 @@ compatible with lzip-1.4 or newer, and can be rescued with lziprecover.
|
|||
Clzip is in fact a C language version of lzip, intended for embedded
|
||||
devices or systems lacking a C++ compiler.
|
||||
|
||||
The lzip file format is designed for long-term data archiving and
|
||||
provides very safe integrity checking. It is as simple as possible (but
|
||||
not simpler), so that with the only help of the lzip manual it would be
|
||||
possible for a digital archaeologist to extract the data from a lzip
|
||||
file long after quantum computers eventually render LZMA obsolete.
|
||||
Additionally lzip is copylefted, which guarantees that it will remain
|
||||
free forever.
|
||||
The lzip file format is designed for long-term data archiving, taking
|
||||
into account both data integrity and decoder availability:
|
||||
|
||||
The member trailer stores the 32-bit CRC of the original data, the size
|
||||
of the original data and the size of the member. These values, together
|
||||
with the value remaining in the range decoder and the end-of-stream
|
||||
marker, provide a 4 factor integrity checking which guarantees that the
|
||||
decompressed version of the data is identical to the original. This
|
||||
guards against corruption of the compressed data, and against undetected
|
||||
bugs in clzip (hopefully very unlikely). The chances of data corruption
|
||||
going undetected are microscopic. Be aware, though, that the check
|
||||
occurs upon decompression, so it can only tell you that something is
|
||||
wrong. It can't help you recover the original uncompressed data.
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file.
|
||||
|
||||
If you ever need to recover data from a damaged lzip file, try the
|
||||
lziprecover program. Lziprecover makes lzip files resistant to bit-flip
|
||||
(one of the most common forms of data corruption), and provides data
|
||||
recovery capabilities, including error-checked merging of damaged copies
|
||||
of a file.
|
||||
* The lzip format is as simple as possible (but not simpler). The
|
||||
lzip manual provides the code of a simple decompressor along with a
|
||||
detailed explanation of how it works, so that with the only help of
|
||||
the lzip manual it would be possible for a digital archaeologist to
|
||||
extract the data from a lzip file long after quantum computers
|
||||
eventually render LZMA obsolete.
|
||||
|
||||
* Additionally lzip is copylefted, which guarantees that it will
|
||||
remain free forever.
|
||||
|
||||
Clzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for tar or zutils.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
|
||||
When compressing, clzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
When decompressing, clzip attempts to guess the name for the decompressed
|
||||
|
@ -78,18 +77,23 @@ Clzip is able to compress and decompress streams of unlimited size by
|
|||
automatically creating multi-member output. The members so created are
|
||||
large, about 64 PiB each.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size
|
||||
without exceeding the given limit. Keep in mind that the decompression
|
||||
memory requirement is affected at compression time by the choice of
|
||||
dictionary size limit.
|
||||
There is no such thing as a "LZMA algorithm"; it is more like a "LZMA
|
||||
coding scheme". For example, the option '-0' of lzip uses the scheme in
|
||||
almost the simplest way possible; issuing the longest match it can find,
|
||||
or a literal byte if it can't find a match. Inversely, a much more
|
||||
elaborated way of finding coding sequences of minimum price than the one
|
||||
currently used by lzip could be developed, and the resulting sequence
|
||||
could also be coded using the LZMA coding scheme.
|
||||
|
||||
Clzip implements a simplified version of the LZMA (Lempel-Ziv-Markov
|
||||
chain-Algorithm) algorithm. The high compression of LZMA comes from
|
||||
combining two basic, well-proven compression ideas: sliding dictionaries
|
||||
(LZ77/78) and markov models (the thing used by every compression
|
||||
algorithm that uses a range encoder or similar order-0 entropy coder as
|
||||
its last stage) with segregation of contexts according to what the bits
|
||||
are used for.
|
||||
Lzip currently implements two variants of the LZMA algorithm; fast (used
|
||||
by option -0) and normal (used by all other compression levels). Clzip
|
||||
just implements the "normal" variant.
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
|
||||
thing used by every compression algorithm that uses a range encoder or
|
||||
similar order-0 entropy coder as its last stage) with segregation of
|
||||
contexts according to what the bits are used for.
|
||||
|
||||
The ideas embodied in clzip are due to (at least) the following people:
|
||||
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue