Merging upstream version 1.10.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
6e0e1539c4
commit
15bdbbe06a
24 changed files with 811 additions and 454 deletions
|
@ -1,5 +1,5 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH CLZIP "1" "April 2017" "clzip 1.9" "User Commands"
|
||||
.TH CLZIP "1" "February 2018" "clzip 1.10" "User Commands"
|
||||
.SH NAME
|
||||
clzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
|
@ -52,7 +52,7 @@ suppress all messages
|
|||
set dictionary size limit in bytes [8 MiB]
|
||||
.TP
|
||||
\fB\-S\fR, \fB\-\-volume\-size=\fR<bytes>
|
||||
set volume size limit in bytes
|
||||
set volume size limit in bytes, implies \fB\-k\fR
|
||||
.TP
|
||||
\fB\-t\fR, \fB\-\-test\fR
|
||||
test compressed file integrity
|
||||
|
@ -68,6 +68,9 @@ alias for \fB\-0\fR
|
|||
.TP
|
||||
\fB\-\-best\fR
|
||||
alias for \fB\-9\fR
|
||||
.TP
|
||||
\fB\-\-loose\-trailing\fR
|
||||
allow trailing data seeming corrupt header
|
||||
.PP
|
||||
If no file names are given, or if a file is '\-', clzip compresses or
|
||||
decompresses from standard input to standard output.
|
||||
|
@ -90,7 +93,7 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
.br
|
||||
Clzip home page: http://www.nongnu.org/lzip/clzip.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2017 Antonio Diaz Diaz.
|
||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
283
doc/clzip.info
283
doc/clzip.info
|
@ -11,11 +11,12 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Clzip Manual
|
||||
************
|
||||
|
||||
This manual is for Clzip (version 1.9, 13 April 2017).
|
||||
This manual is for Clzip (version 1.10, 6 February 2018).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of clzip
|
||||
* Output:: Meaning of clzip's output
|
||||
* Invoking clzip:: Command line interface
|
||||
* Quality assurance:: Design, development and testing of lzip
|
||||
* File format:: Detailed format of the compressed file
|
||||
|
@ -28,13 +29,13 @@ This manual is for Clzip (version 1.9, 13 April 2017).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2010-2017 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to
|
||||
copy, distribute and modify it.
|
||||
|
||||
|
||||
File: clzip.info, Node: Introduction, Next: Invoking clzip, Prev: Top, Up: Top
|
||||
File: clzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
||||
|
||||
1 Introduction
|
||||
**************
|
||||
|
@ -55,7 +56,7 @@ archiving, taking into account both data integrity and decoder
|
|||
availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors
|
||||
recovery means. The lziprecover program can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file. *Note Data safety:
|
||||
|
@ -129,7 +130,7 @@ entirely incomprehensible and therefore pointless.
|
|||
|
||||
Clzip will correctly decompress a file which is the concatenation of
|
||||
two or more compressed files. The result is the concatenation of the
|
||||
corresponding uncompressed files. Integrity testing of concatenated
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
|
||||
Clzip can produce multimember files, and lziprecover can safely
|
||||
|
@ -142,14 +143,58 @@ multivolume compressed tar archives.
|
|||
automatically creating multimember output. The members so created are
|
||||
large, about 2 PiB each.
|
||||
|
||||
|
||||
File: clzip.info, Node: Output, Next: Invoking clzip, Prev: Introduction, Up: Top
|
||||
|
||||
2 Meaning of clzip's output
|
||||
***************************
|
||||
|
||||
The output of clzip looks like this:
|
||||
|
||||
clzip -v foo
|
||||
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
|
||||
|
||||
clzip -tvv foo.lz
|
||||
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
'N:1'
|
||||
The compression ratio (uncompressed_size / compressed_size), shown
|
||||
as N to 1.
|
||||
|
||||
'ratio'
|
||||
The inverse compression ratio
|
||||
(compressed_size / uncompressed_size), shown as a percentage. A
|
||||
decimal ratio is easily obtained by moving the decimal point two
|
||||
places to the left; 14.98% = 0.1498.
|
||||
|
||||
'saved'
|
||||
The space saved by compression (1 - ratio), shown as a percentage.
|
||||
|
||||
'in'
|
||||
The size of the uncompressed data. When decompressing or testing,
|
||||
it is shown as 'decompressed'. Note that clzip always prints the
|
||||
uncompressed size before the compressed size when compressing,
|
||||
decompressing, testing or listing.
|
||||
|
||||
'out'
|
||||
The size of the compressed data. When decompressing or testing, it
|
||||
is shown as 'compressed'.
|
||||
|
||||
|
||||
When decompressing or testing at verbosity level 4 (-vvvv), the
|
||||
dictionary size used to compress the file and the CRC32 of the
|
||||
uncompressed data are also shown.
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may
|
||||
never have been compressed. Decompressed is used to refer to data which
|
||||
have undergone the process of decompression.
|
||||
|
||||
|
||||
File: clzip.info, Node: Invoking clzip, Next: Quality assurance, Prev: Introduction, Up: Top
|
||||
File: clzip.info, Node: Invoking clzip, Next: Quality assurance, Prev: Output, Up: Top
|
||||
|
||||
2 Invoking clzip
|
||||
3 Invoking clzip
|
||||
****************
|
||||
|
||||
The format for running clzip is:
|
||||
|
@ -160,7 +205,7 @@ The format for running clzip is:
|
|||
other FILES and is read just once, the first time it appears in the
|
||||
command line.
|
||||
|
||||
Clzip supports the following options:
|
||||
clzip supports the following options:
|
||||
|
||||
'-h'
|
||||
'--help'
|
||||
|
@ -179,9 +224,10 @@ command line.
|
|||
|
||||
'-b BYTES'
|
||||
'--member-size=BYTES'
|
||||
Set the member size limit to BYTES. A small member size may
|
||||
degrade compression ratio, so use it only when needed. Valid values
|
||||
range from 100 kB to 2 PiB. Defaults to 2 PiB.
|
||||
When compressing, set the member size limit to BYTES. A small
|
||||
member size may degrade compression ratio, so use it only when
|
||||
needed. Valid values range from 100 kB to 2 PiB. Defaults to
|
||||
2 PiB.
|
||||
|
||||
'-c'
|
||||
'--stdout'
|
||||
|
@ -189,15 +235,15 @@ command line.
|
|||
unchanged. If compressing several files, each file is compressed
|
||||
independently. This option is needed when reading from a named
|
||||
pipe (fifo) or from a device. Use it also to recover as much of
|
||||
the uncompressed data as possible when decompressing a corrupt
|
||||
the decompressed data as possible when decompressing a corrupt
|
||||
file.
|
||||
|
||||
'-d'
|
||||
'--decompress'
|
||||
Decompress the specified file(s). If a file does not exist or
|
||||
can't be opened, clzip continues decompressing the rest of the
|
||||
files. If a file fails to decompress, clzip exits immediately
|
||||
without decompressing the rest of the files.
|
||||
Decompress the specified files. If a file does not exist or can't
|
||||
be opened, clzip continues decompressing the rest of the files. If
|
||||
a file fails to decompress, or is a terminal, clzip exits
|
||||
immediately without decompressing the rest of the files.
|
||||
|
||||
'-f'
|
||||
'--force'
|
||||
|
@ -205,8 +251,8 @@ command line.
|
|||
|
||||
'-F'
|
||||
'--recompress'
|
||||
Force re-compression of files whose name already has the '.lz' or
|
||||
'.tlz' suffix.
|
||||
When compressing, force re-compression of files whose name already
|
||||
has the '.lz' or '.tlz' suffix.
|
||||
|
||||
'-k'
|
||||
'--keep'
|
||||
|
@ -216,7 +262,7 @@ command line.
|
|||
'-l'
|
||||
'--list'
|
||||
Print the uncompressed size, compressed size and percentage saved
|
||||
of the specified file(s). Trailing data are ignored. The values
|
||||
of the specified files. Trailing data are ignored. The values
|
||||
produced are correct even for multimember files. If more than one
|
||||
file is given, a final line containing the cumulative sizes is
|
||||
printed. With '-v', the dictionary size, the number of members in
|
||||
|
@ -230,19 +276,20 @@ command line.
|
|||
|
||||
'-m BYTES'
|
||||
'--match-length=BYTES'
|
||||
Set the match length limit in bytes. After a match this long is
|
||||
found, the search is finished. Valid values range from 5 to 273.
|
||||
Larger values usually give better compression ratios but longer
|
||||
compression times.
|
||||
When compressing, set the match length limit in bytes. After a
|
||||
match this long is found, the search is finished. Valid values
|
||||
range from 5 to 273. Larger values usually give better compression
|
||||
ratios but longer compression times.
|
||||
|
||||
'-o FILE'
|
||||
'--output=FILE'
|
||||
When reading from standard input and '--stdout' has not been
|
||||
specified, use 'FILE' as the virtual name of the uncompressed
|
||||
file. This produces a file named 'FILE' when decompressing, a file
|
||||
named 'FILE.lz' when compressing, and several files named
|
||||
'FILE00001.lz', 'FILE00002.lz', etc, when compressing and
|
||||
splitting the output in volumes.
|
||||
file. This produces a file named 'FILE' when decompressing, or a
|
||||
file named 'FILE.lz' when compressing. A second '.lz' extension is
|
||||
not added if 'FILE' already ends in '.lz' or '.tlz'. When
|
||||
compressing and splitting the output in volumes, several files
|
||||
named 'FILE00001.lz', 'FILE00002.lz', etc, are created.
|
||||
|
||||
'-q'
|
||||
'--quiet'
|
||||
|
@ -250,13 +297,13 @@ command line.
|
|||
|
||||
'-s BYTES'
|
||||
'--dictionary-size=BYTES'
|
||||
Set the dictionary size limit in bytes. Clzip will use the smallest
|
||||
possible dictionary size for each file without exceeding this
|
||||
limit. Valid values range from 4 KiB to 512 MiB. Values 12 to 29
|
||||
are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
|
||||
that dictionary sizes are quantized. If the specified size does
|
||||
not match one of the valid sizes, it will be rounded upwards by
|
||||
adding up to (BYTES / 8) to it.
|
||||
When compressing, set the dictionary size limit in bytes. Clzip
|
||||
will use the smallest possible dictionary size for each file
|
||||
without exceeding this limit. Valid values range from 4 KiB to
|
||||
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
|
||||
the specified size does not match one of the valid sizes, it will
|
||||
be rounded upwards by adding up to (BYTES / 8) to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as
|
||||
large as possible, but keep in mind that the decompression memory
|
||||
|
@ -265,38 +312,40 @@ command line.
|
|||
|
||||
'-S BYTES'
|
||||
'--volume-size=BYTES'
|
||||
Split the compressed output into several volume files with names
|
||||
'original_name00001.lz', 'original_name00002.lz', etc, and set the
|
||||
volume size limit to BYTES. Each volume is a complete, maybe
|
||||
multimember, lzip file. A small volume size may degrade compression
|
||||
ratio, so use it only when needed. Valid values range from 100 kB
|
||||
to 4 EiB.
|
||||
When compressing, split the compressed output into several volume
|
||||
files with names 'original_name00001.lz', 'original_name00002.lz',
|
||||
etc, and set the volume size limit to BYTES. Input files are kept
|
||||
unchanged. Each volume is a complete, maybe multimember, lzip
|
||||
file. A small volume size may degrade compression ratio, so use it
|
||||
only when needed. Valid values range from 100 kB to 4 EiB.
|
||||
|
||||
'-t'
|
||||
'--test'
|
||||
Check integrity of the specified file(s), but don't decompress
|
||||
them. This really performs a trial decompression and throws away
|
||||
the result. Use it together with '-v' to see information about
|
||||
the file(s). If a file fails the test, does not exist, can't be
|
||||
opened, or is a terminal, clzip continues checking the rest of the
|
||||
files.
|
||||
Check integrity of the specified files, but don't decompress them.
|
||||
This really performs a trial decompression and throws away the
|
||||
result. Use it together with '-v' to see information about the
|
||||
files. If a file fails the test, does not exist, can't be opened,
|
||||
or is a terminal, clzip continues checking the rest of the files.
|
||||
A final diagnostic is shown at verbosity level 1 or higher if any
|
||||
file fails the test when testing multiple files.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
Verbose mode.
|
||||
When compressing, show the compression ratio for each file
|
||||
processed. A second '-v' shows the progress of compression.
|
||||
When compressing, show the compression ratio and size for each file
|
||||
processed.
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, dictionary
|
||||
size, trailer contents (CRC, data size, member size), and up to 6
|
||||
bytes of trailing data (if any) both in hexadecimal and as a
|
||||
string of printable ASCII characters.
|
||||
Two or more '-v' options show the progress of (de)compression.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression parameters (dictionary size and match length
|
||||
limit) as shown in the table below. The default compression level
|
||||
is '-6'. Note that '-9' can be much slower than '-0'. These
|
||||
options have no effect when decompressing.
|
||||
options have no effect when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a
|
||||
linear scale optimal for all files. If your files are large, very
|
||||
|
@ -319,6 +368,13 @@ command line.
|
|||
'--best'
|
||||
Aliases for GNU gzip compatibility.
|
||||
|
||||
'--loose-trailing'
|
||||
When decompressing, testing or listing, allow trailing data whose
|
||||
first bytes are so similar to the magic bytes of a lzip header
|
||||
that they can be confused with a corrupt header. Use this option
|
||||
if a file triggers a "corrupt header" error and the cause is not
|
||||
indeed a corrupt header.
|
||||
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
and an optional 'B' for "byte".
|
||||
|
@ -344,7 +400,7 @@ caused clzip to panic.
|
|||
|
||||
File: clzip.info, Node: Quality assurance, Next: File format, Prev: Invoking clzip, Up: Top
|
||||
|
||||
3 Design, development and testing of lzip
|
||||
4 Design, development and testing of lzip
|
||||
*****************************************
|
||||
|
||||
There are two ways of constructing a software design: One way is to make
|
||||
|
@ -359,7 +415,7 @@ describes the lessons learned from previous compressors (gzip and
|
|||
bzip2), and their application to the design of lzip.
|
||||
|
||||
|
||||
3.1 Format design
|
||||
4.1 Format design
|
||||
=================
|
||||
|
||||
When gzip was designed in 1992, computers and operating systems were
|
||||
|
@ -377,7 +433,7 @@ one of gzip.
|
|||
|
||||
Probably the worst defect of the gzip format from the point of view
|
||||
of data safety is the variable size of its header. If the byte at
|
||||
offset 3 (flags) of a gzip member gets corrupted, it may become very
|
||||
offset 3 (flags) of a gzip member gets corrupted, it may become
|
||||
difficult to recover the data, even if the compressed blocks are
|
||||
intact, because it can't be known with certainty where the compressed
|
||||
blocks begin.
|
||||
|
@ -399,8 +455,8 @@ error detection. Any distance larger than the dictionary size acts as a
|
|||
forbidden symbol, allowing the decompressor to detect the approximate
|
||||
position of errors, and leaving very little work for the check sequence
|
||||
(CRC and data sizes) in the detection of errors. Lzip is usually able
|
||||
to detect all posible bit-flips in the compressed data without
|
||||
resorting to the check sequence. It would be very difficult to write an
|
||||
to detect all posible bit flips in the compressed data without
|
||||
resorting to the check sequence. It would be difficult to write an
|
||||
automatic recovery tool like lziprecover for the gzip format. And, as
|
||||
far as I know, it has never been written.
|
||||
|
||||
|
@ -409,15 +465,14 @@ decompressed data because it provides more accurate error detection than
|
|||
CRC64 up to a compressed size of about 16 GiB, a size larger than that
|
||||
of most files. In the case of lzip, the additional detection capability
|
||||
of the decompressor reduces the probability of undetected errors more
|
||||
than a million times, making CRC32 more accurate than CRC64 up to about
|
||||
20 PiB of compressed size.
|
||||
than a million times beyond what the CRC32 alone provides.
|
||||
|
||||
The lzip format is designed for long-term archiving. Therefore it
|
||||
excludes any unneeded features that may interfere with the future
|
||||
extraction of the uncompressed data.
|
||||
extraction of the decompressed data.
|
||||
|
||||
|
||||
3.1.1 Gzip format (mis)features not present in lzip
|
||||
4.1.1 Gzip format (mis)features not present in lzip
|
||||
---------------------------------------------------
|
||||
|
||||
'Multiple algorithms'
|
||||
|
@ -438,16 +493,22 @@ extraction of the uncompressed data.
|
|||
compressed blocks.
|
||||
|
||||
'Optional CRC for the header'
|
||||
Using an optional checksum for the header is not only a bad idea,
|
||||
it is an error; it may prevent the extraction of perfectly good
|
||||
data. For example, if the checksum is used and the bit enabling it
|
||||
is reset by a bit-flip, the header will appear to be intact (in
|
||||
spite of being corrupt) while the compressed blocks will appear to
|
||||
be totally unrecoverable (in spite of being intact). Very
|
||||
misleading indeed.
|
||||
Using an optional CRC for the header is not only a bad idea, it is
|
||||
an error; it circumvents the HD of the CRC and may prevent the
|
||||
extraction of perfectly good data. For example, if the CRC is used
|
||||
and the bit enabling it is reset by a bit flip, the header will
|
||||
appear to be intact (in spite of being corrupt) while the
|
||||
compressed blocks will appear to be totally unrecoverable (in
|
||||
spite of being intact). Very misleading indeed.
|
||||
|
||||
'Metadata'
|
||||
The gzip format stores some metadata, like the modification time
|
||||
of the original file or the operating system on which compression
|
||||
took place. This complicates reproducible compression (obtaining
|
||||
identical compressed output from identical input).
|
||||
|
||||
|
||||
3.1.2 Lzip format improvements over gzip and bzip2
|
||||
4.1.2 Lzip format improvements over gzip and bzip2
|
||||
--------------------------------------------------
|
||||
|
||||
'64-bit size field'
|
||||
|
@ -475,7 +536,7 @@ extraction of the uncompressed data.
|
|||
total uncompressed size.
|
||||
|
||||
|
||||
3.2 Quality of implementation
|
||||
4.2 Quality of implementation
|
||||
=============================
|
||||
|
||||
'Accurate and robust error detection'
|
||||
|
@ -521,7 +582,7 @@ extraction of the uncompressed data.
|
|||
|
||||
File: clzip.info, Node: File format, Next: Algorithm, Prev: Quality assurance, Up: Top
|
||||
|
||||
4 File format
|
||||
5 File format
|
||||
*************
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
|
@ -592,7 +653,7 @@ additional information before, between, or after them.
|
|||
|
||||
File: clzip.info, Node: Algorithm, Next: Stream format, Prev: File format, Up: Top
|
||||
|
||||
5 Algorithm
|
||||
6 Algorithm
|
||||
***********
|
||||
|
||||
In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
|
||||
|
@ -658,7 +719,7 @@ LZMA), and Julian Seward (for bzip2's CLI).
|
|||
|
||||
File: clzip.info, Node: Stream format, Next: Trailing data, Prev: Algorithm, Up: Top
|
||||
|
||||
6 Format of the LZMA stream in lzip files
|
||||
7 Format of the LZMA stream in lzip files
|
||||
*****************************************
|
||||
|
||||
The LZMA algorithm has three parameters, called "special LZMA
|
||||
|
@ -698,7 +759,7 @@ the lzip download directory. The source code of lzd is included in
|
|||
appendix A. *Note Reference source code::.
|
||||
|
||||
|
||||
6.1 What is coded
|
||||
7.1 What is coded
|
||||
=================
|
||||
|
||||
The LZMA stream includes literals, matches and repeated matches (matches
|
||||
|
@ -773,7 +834,7 @@ slot + direct_bits distances from 4 to 127
|
|||
slot + (direct_bits - 4) + 4 bits distances from 128 to 2^32 - 1
|
||||
|
||||
|
||||
6.2 The coding contexts
|
||||
7.2 The coding contexts
|
||||
=======================
|
||||
|
||||
These contexts ('Bit_model' in the source), are integers or arrays of
|
||||
|
@ -863,7 +924,7 @@ difference is found, the rest of the byte is decoded using the normal
|
|||
bit tree context. (See 'decode_matched' in the source).
|
||||
|
||||
|
||||
6.3 The range decoder
|
||||
7.3 The range decoder
|
||||
=====================
|
||||
|
||||
The LZMA stream is consumed one byte at a time by the range decoder.
|
||||
|
@ -883,7 +944,7 @@ range decoder. This is done by shifting 5 bytes in the initialization of
|
|||
source).
|
||||
|
||||
|
||||
6.4 Decoding the LZMA stream
|
||||
7.4 Decoding the LZMA stream
|
||||
============================
|
||||
|
||||
After decoding the member header and obtaining the dictionary size, the
|
||||
|
@ -896,7 +957,7 @@ Stream" marker is decoded.
|
|||
|
||||
File: clzip.info, Node: Trailing data, Next: Examples, Prev: Stream format, Up: Top
|
||||
|
||||
7 Extra data appended to the file
|
||||
8 Extra data appended to the file
|
||||
*********************************
|
||||
|
||||
Sometimes extra data are found appended to a lzip file after the last
|
||||
|
@ -908,10 +969,11 @@ member. Such trailing data may be:
|
|||
|
||||
* Useful data added by the user; a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount
|
||||
of text to a lzip file as long as the text does not begin with the
|
||||
string "LZIP", and does not contain any zero bytes (null
|
||||
characters). Nonzero bytes and zero bytes can't be safely mixed in
|
||||
trailing data.
|
||||
of text to a lzip file as long as none of the first four bytes of
|
||||
the text match the corresponding byte in the string "LZIP", and
|
||||
the text does not contain any zero bytes (null characters).
|
||||
Nonzero bytes and zero bytes can't be safely mixed in trailing
|
||||
data.
|
||||
|
||||
* Garbage added by some not totally successful copy operation.
|
||||
|
||||
|
@ -919,12 +981,17 @@ member. Such trailing data may be:
|
|||
and hash value (for a chosen hash) coincide with those of another
|
||||
file.
|
||||
|
||||
* In very rare cases, trailing data could be the corrupt header of
|
||||
another member. In multimember or concatenated files the
|
||||
probability of corruption happening in the magic bytes is 5 times
|
||||
smaller than the probability of getting a false positive caused by
|
||||
the corruption of the integrity information itself. Therefore it
|
||||
can be considered to be below the noise level.
|
||||
* In rare cases, trailing data could be the corrupt header of another
|
||||
member. In multimember or concatenated files the probability of
|
||||
corruption happening in the magic bytes is 5 times smaller than the
|
||||
probability of getting a false positive caused by the corruption
|
||||
of the integrity information itself. Therefore it can be
|
||||
considered to be below the noise level. Additionally, the test
|
||||
used by clzip to discriminate trailing data from a corrupt header
|
||||
has a Hamming distance (HD) of 3, and the 3 bit flips must happen
|
||||
in different magic bytes for the test to fail. In any case, the
|
||||
option '--trailing-error' guarantees that any corrupt header will
|
||||
be detected.
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
reading lzip files are expected to behave as correctly and usefully as
|
||||
|
@ -938,7 +1005,7 @@ cases where a file containing trailing data must be rejected, the option
|
|||
|
||||
File: clzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top
|
||||
|
||||
8 A small tutorial with examples
|
||||
9 A small tutorial with examples
|
||||
********************************
|
||||
|
||||
WARNING! Even if clzip is bug-free, other causes may result in a corrupt
|
||||
|
@ -1020,8 +1087,8 @@ file with a member size of 32 MiB.
|
|||
|
||||
File: clzip.info, Node: Problems, Next: Reference source code, Prev: Examples, Up: Top
|
||||
|
||||
9 Reporting bugs
|
||||
****************
|
||||
10 Reporting bugs
|
||||
*****************
|
||||
|
||||
There are probably bugs in clzip. There are certainly errors and
|
||||
omissions in this manual. If you report them, they will get fixed. If
|
||||
|
@ -1039,7 +1106,7 @@ Appendix A Reference source code
|
|||
********************************
|
||||
|
||||
/* Lzd - Educational decompressor for the lzip format
|
||||
Copyright (C) 2013-2017 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -1355,9 +1422,9 @@ bool LZ_decoder::decode_member() // Returns false if error
|
|||
Bit_model bm_align[dis_align_size];
|
||||
Len_model match_len_model;
|
||||
Len_model rep_len_model;
|
||||
unsigned rep0 = 0; // rep[0-3] latest four distances
|
||||
unsigned rep1 = 0; // used for efficient coding of
|
||||
unsigned rep2 = 0; // repeated distances
|
||||
unsigned rep0 = 0; // rep[0-3] latest four distances
|
||||
unsigned rep1 = 0; // used for efficient coding of
|
||||
unsigned rep2 = 0; // repeated distances
|
||||
unsigned rep3 = 0;
|
||||
State state;
|
||||
|
||||
|
@ -1452,7 +1519,7 @@ int main( const int argc, const char * const argv[] )
|
|||
"It is not safe to use lzd for any real work.\n"
|
||||
"\nUsage: %s < file.lz > file\n", argv[0] );
|
||||
std::printf( "Lzd decompresses from standard input to standard output.\n"
|
||||
"\nCopyright (C) 2017 Antonio Diaz Diaz.\n"
|
||||
"\nCopyright (C) 2018 Antonio Diaz Diaz.\n"
|
||||
"This is free software: you are free to change and redistribute it.\n"
|
||||
"There is NO WARRANTY, to the extent permitted by law.\n"
|
||||
"Report bugs to lzip-bug@nongnu.org\n"
|
||||
|
@ -1497,7 +1564,7 @@ int main( const int argc, const char * const argv[] )
|
|||
}
|
||||
|
||||
if( std::fclose( stdout ) != 0 )
|
||||
{ std::fprintf( stderr, "Can't close stdout: %s\n", std::strerror( errno ) );
|
||||
{ std::fprintf( stderr, "Error closing stdout: %s\n", std::strerror( errno ) );
|
||||
return 1; }
|
||||
return 0;
|
||||
}
|
||||
|
@ -1520,6 +1587,7 @@ Concept index
|
|||
* introduction: Introduction. (line 6)
|
||||
* invoking: Invoking clzip. (line 6)
|
||||
* options: Invoking clzip. (line 6)
|
||||
* output: Output. (line 6)
|
||||
* quality assurance: Quality assurance. (line 6)
|
||||
* reference source code: Reference source code. (line 6)
|
||||
* trailing data: Trailing data. (line 6)
|
||||
|
@ -1530,19 +1598,20 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top210
|
||||
Node: Introduction1154
|
||||
Node: Invoking clzip6630
|
||||
Ref: --trailing-error7202
|
||||
Node: Quality assurance14125
|
||||
Node: File format22281
|
||||
Node: Algorithm24686
|
||||
Node: Stream format27516
|
||||
Node: Trailing data38257
|
||||
Node: Examples40159
|
||||
Ref: concat-example41341
|
||||
Node: Problems42386
|
||||
Node: Reference source code42920
|
||||
Node: Concept index57238
|
||||
Node: Introduction1210
|
||||
Node: Output6491
|
||||
Node: Invoking clzip8011
|
||||
Ref: --trailing-error8577
|
||||
Node: Quality assurance16230
|
||||
Node: File format24640
|
||||
Node: Algorithm27045
|
||||
Node: Stream format29875
|
||||
Node: Trailing data40616
|
||||
Node: Examples42894
|
||||
Ref: concat-example44076
|
||||
Node: Problems45121
|
||||
Node: Reference source code45657
|
||||
Node: Concept index59974
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
235
doc/clzip.texi
235
doc/clzip.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 13 April 2017
|
||||
@set VERSION 1.9
|
||||
@set UPDATED 6 February 2018
|
||||
@set VERSION 1.10
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -36,6 +36,7 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
|
|||
|
||||
@menu
|
||||
* Introduction:: Purpose and features of clzip
|
||||
* Output:: Meaning of clzip's output
|
||||
* Invoking clzip:: Command line interface
|
||||
* Quality assurance:: Design, development and testing of lzip
|
||||
* File format:: Detailed format of the compressed file
|
||||
|
@ -49,7 +50,7 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2010-2017 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2010-2018 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission
|
||||
to copy, distribute and modify it.
|
||||
|
@ -79,7 +80,7 @@ availability:
|
|||
The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The
|
||||
@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
|
||||
program can repair bit-flip errors (one of the most common forms of data
|
||||
program can repair bit flip errors (one of the most common forms of data
|
||||
corruption) in lzip files, and provides data recovery capabilities,
|
||||
including error-checked merging of damaged copies of a file.
|
||||
@ifnothtml
|
||||
|
@ -128,9 +129,9 @@ choice of dictionary size limit.
|
|||
The amount of memory required for compression is about 1 or 2 times the
|
||||
dictionary size limit (1 if input file size is less than dictionary size
|
||||
limit, else 2) plus 9 times the dictionary size really used. The option
|
||||
@samp{-0} is special and only requires about 1.5 MiB at most. The amount
|
||||
of memory required for decompression is about 46 kB larger than the
|
||||
dictionary size really used.
|
||||
@samp{-0} is special and only requires about @w{1.5 MiB} at most. The
|
||||
amount of memory required for decompression is about @w{46 kB} larger
|
||||
than the dictionary size really used.
|
||||
|
||||
When compressing, clzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
|
@ -159,7 +160,7 @@ incomprehensible and therefore pointless.
|
|||
|
||||
Clzip will correctly decompress a file which is the concatenation of two
|
||||
or more compressed files. The result is the concatenation of the
|
||||
corresponding uncompressed files. Integrity testing of concatenated
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
|
||||
Clzip can produce multimember files, and lziprecover can safely recover
|
||||
|
@ -170,7 +171,53 @@ compressed tar archives.
|
|||
|
||||
Clzip is able to compress and decompress streams of unlimited size by
|
||||
automatically creating multimember output. The members so created are
|
||||
large, about 2 PiB each.
|
||||
large, about @w{2 PiB} each.
|
||||
|
||||
|
||||
@node Output
|
||||
@chapter Meaning of clzip's output
|
||||
@cindex output
|
||||
|
||||
The output of clzip looks like this:
|
||||
|
||||
@example
|
||||
clzip -v foo
|
||||
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
|
||||
|
||||
clzip -tvv foo.lz
|
||||
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
|
||||
@end example
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
@table @code
|
||||
@item N:1
|
||||
The compression ratio @w{(uncompressed_size / compressed_size)}, shown
|
||||
as N to 1.
|
||||
|
||||
@item ratio
|
||||
The inverse compression ratio @w{(compressed_size / uncompressed_size)},
|
||||
shown as a percentage. A decimal ratio is easily obtained by moving the
|
||||
decimal point two places to the left; @w{14.98% = 0.1498}.
|
||||
|
||||
@item saved
|
||||
The space saved by compression @w{(1 - ratio)}, shown as a percentage.
|
||||
|
||||
@item in
|
||||
The size of the uncompressed data. When decompressing or testing, it is
|
||||
shown as @code{decompressed}. Note that clzip always prints the
|
||||
uncompressed size before the compressed size when compressing,
|
||||
decompressing, testing or listing.
|
||||
|
||||
@item out
|
||||
The size of the compressed data. When decompressing or testing, it is
|
||||
shown as @code{compressed}.
|
||||
|
||||
@end table
|
||||
|
||||
When decompressing or testing at verbosity level 4 (-vvvv), the
|
||||
dictionary size used to compress the file and the CRC32 of the
|
||||
uncompressed data are also shown.
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
|
||||
have been compressed. Decompressed is used to refer to data which have
|
||||
|
@ -195,7 +242,7 @@ clzip [@var{options}] [@var{files}]
|
|||
mixed with other @var{files} and is read just once, the first time it
|
||||
appears in the command line.
|
||||
|
||||
Clzip supports the following options:
|
||||
clzip supports the following options:
|
||||
|
||||
@table @code
|
||||
@item -h
|
||||
|
@ -215,24 +262,24 @@ garbage that can be safely ignored. @xref{concat-example}.
|
|||
|
||||
@item -b @var{bytes}
|
||||
@itemx --member-size=@var{bytes}
|
||||
Set the member size limit to @var{bytes}. A small member size may
|
||||
degrade compression ratio, so use it only when needed. Valid values
|
||||
range from 100 kB to 2 PiB. Defaults to 2 PiB.
|
||||
When compressing, set the member size limit to @var{bytes}. A small
|
||||
member size may degrade compression ratio, so use it only when needed.
|
||||
Valid values range from @w{100 kB} to @w{2 PiB}. Defaults to @w{2 PiB}.
|
||||
|
||||
@item -c
|
||||
@itemx --stdout
|
||||
Compress or decompress to standard output; keep input files unchanged.
|
||||
If compressing several files, each file is compressed independently.
|
||||
This option is needed when reading from a named pipe (fifo) or from a
|
||||
device. Use it also to recover as much of the uncompressed data as
|
||||
device. Use it also to recover as much of the decompressed data as
|
||||
possible when decompressing a corrupt file.
|
||||
|
||||
@item -d
|
||||
@itemx --decompress
|
||||
Decompress the specified file(s). If a file does not exist or can't be
|
||||
Decompress the specified files. If a file does not exist or can't be
|
||||
opened, clzip continues decompressing the rest of the files. If a file
|
||||
fails to decompress, clzip exits immediately without decompressing the
|
||||
rest of the files.
|
||||
fails to decompress, or is a terminal, clzip exits immediately without
|
||||
decompressing the rest of the files.
|
||||
|
||||
@item -f
|
||||
@itemx --force
|
||||
|
@ -240,8 +287,8 @@ Force overwrite of output files.
|
|||
|
||||
@item -F
|
||||
@itemx --recompress
|
||||
Force re-compression of files whose name already has the @samp{.lz} or
|
||||
@samp{.tlz} suffix.
|
||||
When compressing, force re-compression of files whose name already has
|
||||
the @samp{.lz} or @samp{.tlz} suffix.
|
||||
|
||||
@item -k
|
||||
@itemx --keep
|
||||
|
@ -250,7 +297,7 @@ Keep (don't delete) input files during compression or decompression.
|
|||
@item -l
|
||||
@itemx --list
|
||||
Print the uncompressed size, compressed size and percentage saved of the
|
||||
specified file(s). Trailing data are ignored. The values produced are
|
||||
specified files. Trailing data are ignored. The values produced are
|
||||
correct even for multimember files. If more than one file is given, a
|
||||
final line containing the cumulative sizes is printed. With @samp{-v},
|
||||
the dictionary size, the number of members in the file, and the amount
|
||||
|
@ -263,18 +310,21 @@ verifies that none of the specified files contain trailing data.
|
|||
|
||||
@item -m @var{bytes}
|
||||
@itemx --match-length=@var{bytes}
|
||||
Set the match length limit in bytes. After a match this long is found,
|
||||
the search is finished. Valid values range from 5 to 273. Larger values
|
||||
usually give better compression ratios but longer compression times.
|
||||
When compressing, set the match length limit in bytes. After a match
|
||||
this long is found, the search is finished. Valid values range from 5 to
|
||||
273. Larger values usually give better compression ratios but longer
|
||||
compression times.
|
||||
|
||||
@item -o @var{file}
|
||||
@itemx --output=@var{file}
|
||||
When reading from standard input and @samp{--stdout} has not been
|
||||
specified, use @samp{@var{file}} as the virtual name of the uncompressed
|
||||
file. This produces a file named @samp{@var{file}} when decompressing, a
|
||||
file named @samp{@var{file}.lz} when compressing, and several files
|
||||
named @samp{@var{file}00001.lz}, @samp{@var{file}00002.lz}, etc, when
|
||||
compressing and splitting the output in volumes.
|
||||
file. This produces a file named @samp{@var{file}} when decompressing,
|
||||
or a file named @samp{@var{file}.lz} when compressing. A second
|
||||
@samp{.lz} extension is not added if @samp{@var{file}} already ends in
|
||||
@samp{.lz} or @samp{.tlz}. When compressing and splitting the output in
|
||||
volumes, several files named @samp{@var{file}00001.lz},
|
||||
@samp{@var{file}00002.lz}, etc, are created.
|
||||
|
||||
@item -q
|
||||
@itemx --quiet
|
||||
|
@ -282,12 +332,12 @@ Quiet operation. Suppress all messages.
|
|||
|
||||
@item -s @var{bytes}
|
||||
@itemx --dictionary-size=@var{bytes}
|
||||
Set the dictionary size limit in bytes. Clzip will use the smallest
|
||||
possible dictionary size for each file without exceeding this limit.
|
||||
Valid values range from 4 KiB to 512 MiB. Values 12 to 29 are
|
||||
interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note that
|
||||
dictionary sizes are quantized. If the specified size does not match one
|
||||
of the valid sizes, it will be rounded upwards by adding up to
|
||||
When compressing, set the dictionary size limit in bytes. Clzip will use
|
||||
the smallest possible dictionary size for each file without exceeding
|
||||
this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
|
||||
to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
|
||||
that dictionary sizes are quantized. If the specified size does not
|
||||
match one of the valid sizes, it will be rounded upwards by adding up to
|
||||
@w{(@var{bytes} / 8)} to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as large
|
||||
|
@ -296,37 +346,40 @@ is affected at compression time by the choice of dictionary size limit.
|
|||
|
||||
@item -S @var{bytes}
|
||||
@itemx --volume-size=@var{bytes}
|
||||
Split the compressed output into several volume files with names
|
||||
@samp{original_name00001.lz}, @samp{original_name00002.lz}, etc, and set
|
||||
the volume size limit to @var{bytes}. Each volume is a complete, maybe
|
||||
multimember, lzip file. A small volume size may degrade compression
|
||||
ratio, so use it only when needed. Valid values range from 100 kB to 4
|
||||
EiB.
|
||||
When compressing, split the compressed output into several volume files
|
||||
with names @samp{original_name00001.lz}, @samp{original_name00002.lz},
|
||||
etc, and set the volume size limit to @var{bytes}. Input files are kept
|
||||
unchanged. Each volume is a complete, maybe multimember, lzip file. A
|
||||
small volume size may degrade compression ratio, so use it only when
|
||||
needed. Valid values range from @w{100 kB} to @w{4 EiB}.
|
||||
|
||||
@item -t
|
||||
@itemx --test
|
||||
Check integrity of the specified file(s), but don't decompress them.
|
||||
This really performs a trial decompression and throws away the result.
|
||||
Use it together with @samp{-v} to see information about the file(s). If
|
||||
a file fails the test, does not exist, can't be opened, or is a
|
||||
terminal, clzip continues checking the rest of the files.
|
||||
Check integrity of the specified files, but don't decompress them. This
|
||||
really performs a trial decompression and throws away the result. Use it
|
||||
together with @samp{-v} to see information about the files. If a file
|
||||
fails the test, does not exist, can't be opened, or is a terminal, clzip
|
||||
continues checking the rest of the files. A final diagnostic is shown at
|
||||
verbosity level 1 or higher if any file fails the test when testing
|
||||
multiple files.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
Verbose mode.@*
|
||||
When compressing, show the compression ratio for each file processed. A
|
||||
second @samp{-v} shows the progress of compression.@*
|
||||
When compressing, show the compression ratio and size for each file
|
||||
processed.@*
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, dictionary size,
|
||||
trailer contents (CRC, data size, member size), and up to 6 bytes of
|
||||
trailing data (if any) both in hexadecimal and as a string of printable
|
||||
ASCII characters.
|
||||
ASCII characters.@*
|
||||
Two or more @samp{-v} options show the progress of (de)compression.
|
||||
|
||||
@item -0 .. -9
|
||||
Set the compression parameters (dictionary size and match length limit)
|
||||
as shown in the table below. The default compression level is @samp{-6}.
|
||||
Note that @samp{-9} can be much slower than @samp{-0}. These options
|
||||
have no effect when decompressing.
|
||||
have no effect when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
|
@ -351,6 +404,12 @@ etc, you may need to use the @samp{--dictionary-size} and
|
|||
@itemx --best
|
||||
Aliases for GNU gzip compatibility.
|
||||
|
||||
@item --loose-trailing
|
||||
When decompressing, testing or listing, allow trailing data whose first
|
||||
bytes are so similar to the magic bytes of a lzip header that they can
|
||||
be confused with a corrupt header. Use this option if a file triggers a
|
||||
"corrupt header" error and the cause is not indeed a corrupt header.
|
||||
|
||||
@end table
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
|
@ -410,7 +469,7 @@ of gzip.
|
|||
|
||||
Probably the worst defect of the gzip format from the point of view of
|
||||
data safety is the variable size of its header. If the byte at offset 3
|
||||
(flags) of a gzip member gets corrupted, it may become very difficult to
|
||||
(flags) of a gzip member gets corrupted, it may become difficult to
|
||||
recover the data, even if the compressed blocks are intact, because it
|
||||
can't be known with certainty where the compressed blocks begin.
|
||||
|
||||
|
@ -431,22 +490,21 @@ distance larger than the dictionary size acts as a forbidden symbol,
|
|||
allowing the decompressor to detect the approximate position of errors,
|
||||
and leaving very little work for the check sequence (CRC and data sizes)
|
||||
in the detection of errors. Lzip is usually able to detect all posible
|
||||
bit-flips in the compressed data without resorting to the check
|
||||
sequence. It would be very difficult to write an automatic recovery tool
|
||||
like lziprecover for the gzip format. And, as far as I know, it has
|
||||
never been written.
|
||||
bit flips in the compressed data without resorting to the check
|
||||
sequence. It would be difficult to write an automatic recovery tool like
|
||||
lziprecover for the gzip format. And, as far as I know, it has never
|
||||
been written.
|
||||
|
||||
Lzip, like gzip and bzip2, uses a CRC32 to check the integrity of the
|
||||
decompressed data because it provides more accurate error detection than
|
||||
CRC64 up to a compressed size of about 16 GiB, a size larger than that
|
||||
of most files. In the case of lzip, the additional detection capability
|
||||
of the decompressor reduces the probability of undetected errors more
|
||||
than a million times, making CRC32 more accurate than CRC64 up to about
|
||||
20 PiB of compressed size.
|
||||
CRC64 up to a compressed size of about @w{16 GiB}, a size larger than
|
||||
that of most files. In the case of lzip, the additional detection
|
||||
capability of the decompressor reduces the probability of undetected
|
||||
errors more than a million times beyond what the CRC32 alone provides.
|
||||
|
||||
The lzip format is designed for long-term archiving. Therefore it
|
||||
excludes any unneeded features that may interfere with the future
|
||||
extraction of the uncompressed data.
|
||||
extraction of the decompressed data.
|
||||
|
||||
@sp 1
|
||||
@subsection Gzip format (mis)features not present in lzip
|
||||
|
@ -472,12 +530,20 @@ header CRC nor the compressed blocks.
|
|||
|
||||
@item Optional CRC for the header
|
||||
|
||||
Using an optional checksum for the header is not only a bad idea, it is
|
||||
an error; it may prevent the extraction of perfectly good data. For
|
||||
example, if the checksum is used and the bit enabling it is reset by a
|
||||
bit-flip, the header will appear to be intact (in spite of being
|
||||
corrupt) while the compressed blocks will appear to be totally
|
||||
unrecoverable (in spite of being intact). Very misleading indeed.
|
||||
Using an optional CRC for the header is not only a bad idea, it is an
|
||||
error; it circumvents the HD of the CRC and may prevent the extraction
|
||||
of perfectly good data. For example, if the CRC is used and the bit
|
||||
enabling it is reset by a bit flip, the header will appear to be intact
|
||||
(in spite of being corrupt) while the compressed blocks will appear to
|
||||
be totally unrecoverable (in spite of being intact). Very misleading
|
||||
indeed.
|
||||
|
||||
@item Metadata
|
||||
|
||||
The gzip format stores some metadata, like the modification time of the
|
||||
original file or the operating system on which compression took place.
|
||||
This complicates reproducible compression (obtaining identical
|
||||
compressed output from identical input).
|
||||
|
||||
@end table
|
||||
|
||||
|
@ -488,7 +554,7 @@ unrecoverable (in spite of being intact). Very misleading indeed.
|
|||
|
||||
Probably the most frequently reported shortcoming of the gzip format is
|
||||
that it only stores the least significant 32 bits of the uncompressed
|
||||
size. The size of any file larger than 4 GiB gets truncated.
|
||||
size. The size of any file larger than @w{4 GiB} gets truncated.
|
||||
|
||||
Bzip2 does not store the uncompressed size of the file.
|
||||
|
||||
|
@ -965,9 +1031,10 @@ padding zero bytes to a lzip file.
|
|||
@item
|
||||
Useful data added by the user; a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount of
|
||||
text to a lzip file as long as the text does not begin with the string
|
||||
"LZIP", and does not contain any zero bytes (null characters). Nonzero
|
||||
bytes and zero bytes can't be safely mixed in trailing data.
|
||||
text to a lzip file as long as none of the first four bytes of the text
|
||||
match the corresponding byte in the string "LZIP", and the text does not
|
||||
contain any zero bytes (null characters). Nonzero bytes and zero bytes
|
||||
can't be safely mixed in trailing data.
|
||||
|
||||
@item
|
||||
Garbage added by some not totally successful copy operation.
|
||||
|
@ -977,12 +1044,16 @@ Malicious data added to the file in order to make its total size and
|
|||
hash value (for a chosen hash) coincide with those of another file.
|
||||
|
||||
@item
|
||||
In very rare cases, trailing data could be the corrupt header of another
|
||||
In rare cases, trailing data could be the corrupt header of another
|
||||
member. In multimember or concatenated files the probability of
|
||||
corruption happening in the magic bytes is 5 times smaller than the
|
||||
probability of getting a false positive caused by the corruption of the
|
||||
integrity information itself. Therefore it can be considered to be below
|
||||
the noise level.
|
||||
the noise level. Additionally, the test used by clzip to discriminate
|
||||
trailing data from a corrupt header has a Hamming distance (HD) of 3,
|
||||
and the 3 bit flips must happen in different magic bytes for the test to
|
||||
fail. In any case, the option @samp{--trailing-error} guarantees that
|
||||
any corrupt header will be detected.
|
||||
@end itemize
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
|
@ -1018,7 +1089,7 @@ clzip -v file
|
|||
@sp 1
|
||||
@noindent
|
||||
Example 2: Like example 1 but the created @samp{file.lz} is multimember
|
||||
with a member size of 1 MiB. The compression ratio is not shown.
|
||||
with a member size of @w{1 MiB}. The compression ratio is not shown.
|
||||
|
||||
@example
|
||||
clzip -b 1MiB file
|
||||
|
@ -1067,7 +1138,7 @@ Do this instead
|
|||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 7: Decompress @samp{file.lz} partially until 10 KiB of
|
||||
Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of
|
||||
decompressed data are produced.
|
||||
|
||||
@example
|
||||
|
@ -1086,7 +1157,7 @@ clzip -cd file.lz | dd bs=1000 skip=10 count=5
|
|||
@sp 1
|
||||
@noindent
|
||||
Example 9: Create a multivolume compressed tar archive with a volume
|
||||
size of 1440 KiB.
|
||||
size of @w{1440 KiB}.
|
||||
|
||||
@example
|
||||
tar -c some_directory | clzip -S 1440KiB -o volume_name
|
||||
|
@ -1103,8 +1174,8 @@ clzip -cd volume_name*.lz | tar -xf -
|
|||
@sp 1
|
||||
@noindent
|
||||
Example 11: Create a multivolume compressed backup of a large database
|
||||
file with a volume size of 650 MB, where each volume is a multimember
|
||||
file with a member size of 32 MiB.
|
||||
file with a volume size of @w{650 MB}, where each volume is a
|
||||
multimember file with a member size of @w{32 MiB}.
|
||||
|
||||
@example
|
||||
clzip -b 32MiB -S 650MB big_db
|
||||
|
@ -1132,7 +1203,7 @@ find by running @w{@code{clzip --version}}.
|
|||
|
||||
@verbatim
|
||||
/* Lzd - Educational decompressor for the lzip format
|
||||
Copyright (C) 2013-2017 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -1448,9 +1519,9 @@ bool LZ_decoder::decode_member() // Returns false if error
|
|||
Bit_model bm_align[dis_align_size];
|
||||
Len_model match_len_model;
|
||||
Len_model rep_len_model;
|
||||
unsigned rep0 = 0; // rep[0-3] latest four distances
|
||||
unsigned rep1 = 0; // used for efficient coding of
|
||||
unsigned rep2 = 0; // repeated distances
|
||||
unsigned rep0 = 0; // rep[0-3] latest four distances
|
||||
unsigned rep1 = 0; // used for efficient coding of
|
||||
unsigned rep2 = 0; // repeated distances
|
||||
unsigned rep3 = 0;
|
||||
State state;
|
||||
|
||||
|
@ -1545,7 +1616,7 @@ int main( const int argc, const char * const argv[] )
|
|||
"It is not safe to use lzd for any real work.\n"
|
||||
"\nUsage: %s < file.lz > file\n", argv[0] );
|
||||
std::printf( "Lzd decompresses from standard input to standard output.\n"
|
||||
"\nCopyright (C) 2017 Antonio Diaz Diaz.\n"
|
||||
"\nCopyright (C) 2018 Antonio Diaz Diaz.\n"
|
||||
"This is free software: you are free to change and redistribute it.\n"
|
||||
"There is NO WARRANTY, to the extent permitted by law.\n"
|
||||
"Report bugs to lzip-bug@nongnu.org\n"
|
||||
|
@ -1590,7 +1661,7 @@ int main( const int argc, const char * const argv[] )
|
|||
}
|
||||
|
||||
if( std::fclose( stdout ) != 0 )
|
||||
{ std::fprintf( stderr, "Can't close stdout: %s\n", std::strerror( errno ) );
|
||||
{ std::fprintf( stderr, "Error closing stdout: %s\n", std::strerror( errno ) );
|
||||
return 1; }
|
||||
return 0;
|
||||
}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue