1
0
Fork 0

Merging upstream version 1.14.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-20 21:33:45 +01:00
parent a5f6fd65d6
commit 981b7e2738
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
29 changed files with 744 additions and 652 deletions

View file

@ -11,7 +11,7 @@ File: lzlib.info, Node: Top, Next: Introduction, Up: (dir)
Lzlib Manual
************
This manual is for Lzlib (version 1.13, 23 January 2022).
This manual is for Lzlib (version 1.14, 20 January 2024).
* Menu:
@ -23,14 +23,14 @@ This manual is for Lzlib (version 1.13, 23 January 2022).
* Decompression functions:: Descriptions of the decompression functions
* Error codes:: Meaning of codes returned by functions
* Error messages:: Error messages corresponding to error codes
* Invoking minilzip:: Command line interface of the test program
* Invoking minilzip:: Command-line interface of the test program
* Data format:: Detailed format of the compressed data
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
Copyright (C) 2009-2022 Antonio Diaz Diaz.
Copyright (C) 2009-2024 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@ -76,6 +76,13 @@ library are declared in the file 'lzlib.h'. Usage examples of the library
are given in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from
the source distribution.
As 'lzlib.h' can be used by C and C++ programs, it must not impose a
choice of system headers on the program by including one of them. Therefore
it is the responsibility of the program using lzlib to include before
'lzlib.h' some header that declares the type 'uint8_t'. There are at least
four such headers in C and C++: 'stdint.h', 'cstdint', 'inttypes.h', and
'cinttypes'.
All the library functions are thread safe. The library does not install
any signal handler. The decoder checks the consistency of the compressed
data, so the library should never crash even in case of corrupted input.
@ -86,21 +93,21 @@ This interface is safer and less error prone than the traditional zlib
interface.
Compression/decompression is done when the read function is called. This
means the value returned by the position functions will not be updated until
a read call, even if a lot of data are written. If you want the data to be
means the value returned by the position functions is not updated until a
read call, even if a lot of data are written. If you want the data to be
compressed in advance, just call the read function with a SIZE equal to 0.
If all the data to be compressed are written in advance, lzlib will
automatically adjust the header of the compressed data to use the largest
If all the data to be compressed are written in advance, lzlib
automatically adjusts the header of the compressed data to use the largest
dictionary size that does not exceed neither the data size nor the limit
given to 'LZ_compress_open'. This feature reduces the amount of memory
needed for decompression and allows minilzip to produce identical compressed
output as lzip.
needed for decompression and allows minilzip to produce identical
compressed output as lzip.
Lzlib will correctly decompress a data stream which is the concatenation
of two or more compressed data streams. The result is the concatenation of
the corresponding decompressed data streams. Integrity testing of
concatenated compressed data streams is also supported.
Lzlib correctly decompresses a data stream which is the concatenation of
two or more compressed data streams. The result is the concatenation of the
corresponding decompressed data streams. Integrity testing of concatenated
compressed data streams is also supported.
Lzlib is able to compress and decompress streams of unlimited size by
automatically creating multimember output. The members so created are large,
@ -111,22 +118,22 @@ concrete algorithm; it is more like "any algorithm using the LZMA coding
scheme". For example, the option '-0' of lzip uses the scheme in almost the
simplest way possible; issuing the longest match it can find, or a literal
byte if it can't find a match. Inversely, a much more elaborated way of
finding coding sequences of minimum size than the one currently used by lzip
could be developed, and the resulting sequence could also be coded using the
LZMA coding scheme.
finding coding sequences of minimum size than the one currently used by
lzip could be developed, and the resulting sequence could also be coded
using the LZMA coding scheme.
Lzlib currently implements two variants of the LZMA algorithm: fast
(used by option '-0' of minilzip) and normal (used by all other compression
levels).
The high compression of LZMA comes from combining two basic, well-proven
compression ideas: sliding dictionaries (LZ77/78) and markov models (the
thing used by every compression algorithm that uses a range encoder or
similar order-0 entropy coder as its last stage) with segregation of
contexts according to what the bits are used for.
compression ideas: sliding dictionaries (LZ77) and markov models (the thing
used by every compression algorithm that uses a range encoder or similar
order-0 entropy coder as its last stage) with segregation of contexts
according to what the bits are used for.
The ideas embodied in lzlib are due to (at least) the following people:
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the
Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the
definition of Markov chains), G.N.N. Martin (for the definition of range
encoding), Igor Pavlov (for putting all the above together in LZMA), and
Julian Seward (for bzip2's CLI).
@ -150,7 +157,7 @@ of them are declared in 'lzlib.h'.
-- Constant: LZ_API_VERSION
This constant is defined in 'lzlib.h' and works as a version test
macro. The application should verify at compile time that
macro. The application should check at compile time that
LZ_API_VERSION is greater than or equal to the version required by the
application:
@ -170,12 +177,13 @@ desire to have certain symbols and prototypes exposed.
-- Function: int LZ_api_version ( void )
If LZ_API_VERSION >= 1012, this function is declared in 'lzlib.h' (else
it doesn't exist). It returns the LZ_API_VERSION of the library object
code being used. The application should verify at run time that the
code being used. The application should check at run time that the
value returned by 'LZ_api_version' is greater than or equal to the
version required by the application. An application may be dinamically
version required by the application. An application may be dynamically
linked at run time with a different version of lzlib than the one it
was compiled for, and this should not break the program as long as the
library used provides the functionality required by the application.
was compiled for, and this should not break the application as long as
the library used provides the functionality required by the
application.
#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
if( LZ_api_version() < 1012 )
@ -258,7 +266,7 @@ File: lzlib.info, Node: Compression functions, Next: Decompression functions,
These are the functions used to compress data. In case of error, all of
them return -1 or 0, for signed and unsigned return values respectively,
except 'LZ_compress_open' whose return value must be verified by calling
except 'LZ_compress_open' whose return value must be checked by calling
'LZ_compress_errno' before using it.
-- Function: struct LZ_Encoder * LZ_compress_open ( const int
@ -269,7 +277,7 @@ except 'LZ_compress_open' whose return value must be verified by calling
LZ_compress functions, or a null pointer if the encoder could not be
allocated.
The returned pointer must be verified by calling 'LZ_compress_errno'
The returned pointer must be checked by calling 'LZ_compress_errno'
before using it. If 'LZ_compress_errno' does not return 'LZ_ok', the
returned pointer must not be used and should be freed with
'LZ_compress_close' to avoid memory leaks.
@ -277,8 +285,8 @@ except 'LZ_compress_open' whose return value must be verified by calling
DICTIONARY_SIZE sets the dictionary size to be used, in bytes. Valid
values range from 4 KiB to 512 MiB. Note that dictionary sizes are
quantized. If the size specified does not match one of the valid
sizes, it will be rounded upwards by adding up to
(DICTIONARY_SIZE / 8) to it.
sizes, it is rounded upwards by adding up to (DICTIONARY_SIZE / 8) to
it.
MATCH_LEN_LIMIT sets the match length limit in bytes. Valid values
range from 5 to 273. Larger values usually give better compression
@ -286,15 +294,14 @@ except 'LZ_compress_open' whose return value must be verified by calling
If DICTIONARY_SIZE is 65535 and MATCH_LEN_LIMIT is 16, the fast
variant of LZMA is chosen, which produces identical compressed output
as 'lzip -0'. (The dictionary size used will be rounded upwards to
64 KiB).
as 'lzip -0'. (The dictionary size used is rounded upwards to 64 KiB).
MEMBER_SIZE sets the member size limit in bytes. Valid values range
from 4 KiB to 2 PiB. A small member size may degrade compression
ratio, so use it only when needed. To produce a single-member data
stream, give MEMBER_SIZE a value larger than the amount of data to be
produced. Values larger than 2 PiB will be reduced to 2 PiB to prevent
the uncompressed size of the member from overflowing.
produced. Values larger than 2 PiB are reduced to 2 PiB to prevent the
uncompressed size of the member from overflowing.
-- Function: int LZ_compress_close ( struct LZ_Encoder * const ENCODER )
Frees all dynamically allocated data structures for this stream. This
@ -420,7 +427,7 @@ File: lzlib.info, Node: Decompression functions, Next: Error codes, Prev: Com
These are the functions used to decompress data. In case of error, all of
them return -1 or 0, for signed and unsigned return values respectively,
except 'LZ_decompress_open' whose return value must be verified by calling
except 'LZ_decompress_open' whose return value must be checked by calling
'LZ_decompress_errno' before using it.
-- Function: struct LZ_Decoder * LZ_decompress_open ( void )
@ -429,7 +436,7 @@ except 'LZ_decompress_open' whose return value must be verified by calling
LZ_decompress functions, or a null pointer if the decoder could not be
allocated.
The returned pointer must be verified by calling 'LZ_decompress_errno'
The returned pointer must be checked by calling 'LZ_decompress_errno'
before using it. If 'LZ_decompress_errno' does not return 'LZ_ok', the
returned pointer must not be used and should be freed with
'LZ_decompress_close' to avoid memory leaks.
@ -459,13 +466,13 @@ except 'LZ_decompress_open' whose return value must be verified by calling
Resets the error state of DECODER and enters a search state that lasts
until a new member header (or the end of the stream) is found. After a
successful call to 'LZ_decompress_sync_to_member', data written with
'LZ_decompress_write' will be consumed and 'LZ_decompress_read' will
return 0 until a header is found.
'LZ_decompress_write' is consumed and 'LZ_decompress_read' returns 0
until a header is found.
This function is useful to discard any data preceding the first member,
or to discard the rest of the current member, for example in case of a
data error. If the decoder is already at the beginning of a member,
this function does nothing.
This function is useful to discard any data preceding the first
member, or to discard the rest of the current member, for example in
case of a data error. If the decoder is already at the beginning of a
member, this function does nothing.
-- Function: int LZ_decompress_read ( struct LZ_Decoder * const DECODER,
uint8_t * const BUFFER, const int SIZE )
@ -571,7 +578,7 @@ File: lzlib.info, Node: Error codes, Next: Error messages, Prev: Decompressio
Most library functions return -1 to indicate that they have failed. But
this return value only tells you that an error has occurred. To find out
what kind of error it was, you need to verify the error code by calling
what kind of error it was, you need to check the error code by calling
'LZ_(de)compress_errno'.
Library functions don't change the value returned by
@ -639,19 +646,20 @@ File: lzlib.info, Node: Invoking minilzip, Next: Data format, Prev: Error mes
9 Invoking minilzip
*******************
Minilzip is a test program for the compression library lzlib, fully
compatible with lzip 1.4 or newer.
Minilzip is a test program for the compression library lzlib, compatible
with lzip 1.4 or newer.
Lzip is a lossless data compressor with a user interface similar to the
one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
checking to maximize interoperability and optimize safety. Lzip can compress
about as fast as gzip (lzip -0) or compress most files more than bzip2
(lzip -9). Decompression speed is intermediate between gzip and bzip2. Lzip
is better than gzip and bzip2 from a data recovery perspective. Lzip has
been designed, written, and tested with great care to replace gzip and
bzip2 as the standard general-purpose compressed format for unix-like
systems.
chain-Algorithm' (LZMA) stream format to maximize interoperability. The
maximum dictionary size is 512 MiB so that any lzip file can be decompressed
on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
checking. Lzip can compress about as fast as gzip (lzip -0) or compress most
files more than bzip2 (lzip -9). Decompression speed is intermediate between
gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
perspective. Lzip has been designed, written, and tested with great care to
replace gzip and bzip2 as the standard general-purpose compressed format for
Unix-like systems.
The format for running minilzip is:
@ -660,7 +668,8 @@ The format for running minilzip is:
If no file names are specified, minilzip compresses (or decompresses) from
standard input to standard output. A hyphen '-' used as a FILE argument
means standard input. It can be mixed with other FILES and is read just
once, the first time it appears in the command line.
once, the first time it appears in the command line. Remember to prepend
'./' to any file name beginning with a hyphen, or use '--'.
minilzip supports the following options: *Note Argument syntax:
(arg_parser)Argument syntax.
@ -696,17 +705,18 @@ once, the first time it appears in the command line.
members). This option (or '-o') is needed when reading from a named
pipe (fifo) or from a device. Use it also to recover as much of the
decompressed data as possible when decompressing a corrupt file. '-c'
overrides '-o' and '-S'. '-c' has no effect when testing or listing.
overrides '-o' and '-S'. '-c' has no effect when testing.
'-d'
'--decompress'
Decompress the files specified. If a file does not exist, can't be
opened, or the destination file already exists and '--force' has not
been specified, minilzip continues decompressing the rest of the files
and exits with error status 1. If a file fails to decompress, or is a
terminal, minilzip exits immediately with error status 2 without
decompressing the rest of the files. A terminal is considered an
uncompressed file, and therefore invalid.
Decompress the files specified. The integrity of the files specified is
checked. If a file does not exist, can't be opened, or the destination
file already exists and '--force' has not been specified, minilzip
continues decompressing the rest of the files and exits with error
status 1. If a file fails to decompress, or is a terminal, minilzip
exits immediately with error status 2 without decompressing the rest
of the files. A terminal is considered an uncompressed file, and
therefore invalid.
'-f'
'--force'
@ -725,17 +735,17 @@ once, the first time it appears in the command line.
'--match-length=BYTES'
When compressing, set the match length limit in bytes. After a match
this long is found, the search is finished. Valid values range from 5
to 273. Larger values usually give better compression ratios but longer
compression times.
to 273. Larger values usually give better compression ratios but
longer compression times.
'-o FILE'
'--output=FILE'
If '-c' has not been also specified, write the (de)compressed output to
FILE; keep input files unchanged. If compressing several files, each
file is compressed independently. (The output consists of a sequence of
independently compressed members). This option (or '-c') is needed when
reading from a named pipe (fifo) or from a device. '-o -' is
equivalent to '-c'. '-o' has no effect when testing or listing.
If '-c' has not been also specified, write the (de)compressed output
to FILE; keep input files unchanged. If compressing several files,
each file is compressed independently. (The output consists of a
sequence of independently compressed members). This option (or '-c')
is needed when reading from a named pipe (fifo) or from a device.
'-o -' is equivalent to '-c'. '-o' has no effect when testing.
When compressing and splitting the output in volumes, FILE is used as
a prefix, and several files named 'FILE00001.lz', 'FILE00002.lz', etc,
@ -748,13 +758,13 @@ once, the first time it appears in the command line.
'-s BYTES'
'--dictionary-size=BYTES'
When compressing, set the dictionary size limit in bytes. Minilzip
will use for each file the largest dictionary size that does not
exceed neither the file size nor this limit. Valid values range from
4 KiB to 512 MiB. Values 12 to 29 are interpreted as powers of two,
meaning 2^12 to 2^29 bytes. Dictionary sizes are quantized so that
they can be coded in just one byte (*note coded-dict-size::). If the
size specified does not match one of the valid sizes, it will be
rounded upwards by adding up to (BYTES / 8) to it.
uses for each file the largest dictionary size that does not exceed
neither the file size nor this limit. Valid values range from 4 KiB to
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
coded in just one byte (*note coded-dict-size::). If the size
specified does not match one of the valid sizes, it is rounded upwards
by adding up to (BYTES / 8) to it.
For maximum compression you should use a dictionary size limit as large
as possible, but keep in mind that the decompression memory requirement
@ -776,7 +786,7 @@ once, the first time it appears in the command line.
really performs a trial decompression and throws away the result. Use
it together with '-v' to see information about the files. If a file
fails the test, does not exist, can't be opened, or is a terminal,
minilzip continues checking the rest of the files. A final diagnostic
minilzip continues testing the rest of the files. A final diagnostic
is shown at verbosity level 1 or higher if any file fails the test
when testing multiple files.
@ -839,26 +849,29 @@ once, the first time it appears in the command line.
defined). *Note Library version::.
Numbers given as arguments to options may be followed by a multiplier
and an optional 'B' for "byte".
Numbers given as arguments to options may be expressed in decimal,
hexadecimal, or octal (using the same syntax as integer constants in C++),
and may be followed by a multiplier and an optional 'B' for "byte".
Table of SI and binary prefixes (unit multipliers):
Prefix Value | Prefix Value
k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024)
M megabyte (10^6) | Mi mebibyte (2^20)
G gigabyte (10^9) | Gi gibibyte (2^30)
T terabyte (10^12) | Ti tebibyte (2^40)
P petabyte (10^15) | Pi pebibyte (2^50)
E exabyte (10^18) | Ei exbibyte (2^60)
Z zettabyte (10^21) | Zi zebibyte (2^70)
Y yottabyte (10^24) | Yi yobibyte (2^80)
Prefix Value | Prefix Value
k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024)
M megabyte (10^6) | Mi mebibyte (2^20)
G gigabyte (10^9) | Gi gibibyte (2^30)
T terabyte (10^12) | Ti tebibyte (2^40)
P petabyte (10^15) | Pi pebibyte (2^50)
E exabyte (10^18) | Ei exbibyte (2^60)
Z zettabyte (10^21) | Zi zebibyte (2^70)
Y yottabyte (10^24) | Yi yobibyte (2^80)
R ronnabyte (10^27) | Ri robibyte (2^90)
Q quettabyte (10^30) | Qi quebibyte (2^100)
Exit status: 0 for a normal exit, 1 for environmental problems (file not
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
input file, 3 for an internal consistency error (e.g., bug) which caused
minilzip to panic.
found, invalid command-line options, I/O errors, etc), 2 to indicate a
corrupt or invalid input file, 3 for an internal consistency error (e.g.,
bug) which caused minilzip to panic.

File: lzlib.info, Node: Data format, Next: Examples, Prev: Invoking minilzip, Up: Top
@ -886,7 +899,7 @@ when there is no longer anything to take away.
represents a variable number of bytes.
Lzip data consist of a series of independent "members" (compressed data
Lzip data consist of one or more independent "members" (compressed data
sets). The members simply appear one after another in the data stream, with
no additional information before, between, or after them. Each member can
encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
@ -933,10 +946,10 @@ size of a multimember data stream is unlimited.
'Member size (8 bytes)'
Total size of the member, including header and trailer. This field acts
as a distributed index, allows the verification of stream integrity,
and facilitates the safe recovery of undamaged members from
multimember files. Member size should be limited to 2 PiB to prevent
the data size field from overflowing.
as a distributed index, improves the checking of stream integrity, and
facilitates the safe recovery of undamaged members from multimember
files. Lzip limits the member size to 2 PiB to prevent the data size
field from overflowing.

@ -1234,7 +1247,7 @@ int ffrsdecompress( struct LZ_Decoder * const decoder,
if( LZ_decompress_errno( decoder ) == LZ_header_error ||
LZ_decompress_errno( decoder ) == LZ_data_error )
{ LZ_decompress_sync_to_member( decoder ); continue; }
else break;
break;
}
len = fwrite( buffer, 1, ret, outfile );
if( len < ret ) break;
@ -1293,27 +1306,27 @@ Concept index
Tag Table:
Node: Top215
Node: Introduction1338
Node: Library version6413
Node: Buffering8957
Node: Parameter limits10182
Node: Compression functions11136
Ref: member_size12946
Ref: sync_flush14712
Node: Decompression functions19400
Node: Error codes26968
Node: Error messages29259
Node: Invoking minilzip29838
Node: Data format39786
Ref: coded-dict-size41232
Node: Examples42641
Node: Buffer compression43602
Node: Buffer decompression45122
Node: File compression46536
Node: File decompression47519
Node: File compression mm48523
Node: Skipping data errors51552
Node: Problems52862
Node: Concept index53423
Node: Library version6778
Node: Buffering9329
Node: Parameter limits10554
Node: Compression functions11508
Ref: member_size13301
Ref: sync_flush15063
Node: Decompression functions19751
Node: Error codes27308
Node: Error messages29598
Node: Invoking minilzip30177
Node: Data format40595
Ref: coded-dict-size42041
Node: Examples43446
Node: Buffer compression44407
Node: Buffer decompression45927
Node: File compression47341
Node: File decompression48324
Node: File compression mm49328
Node: Skipping data errors52357
Node: Problems53662
Node: Concept index54223

End Tag Table