1
0
Fork 0

Merging upstream version 0.6.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 22:51:44 +01:00
parent e39d8907e0
commit f4329ad86e
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
13 changed files with 378 additions and 275 deletions

96
README
View file

@ -1,25 +1,23 @@
Description
Xlunzip is a test tool for the lzip decompression code of my lzip patch
for linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress
linux module as a backend. Xlunzip tests the module for stream,
buffer-to-buffer and mixed decompression modes, including in-place
decompression (using the same buffer for input and output). You can use
xlunzip to verify that the module produces correct results when
decompressing single member files, multimember files, or the
concatenation of two or more compressed files. Xlunzip can be used with
unzcrash to test the robustness of the module to the decompression of
corrupted data.
Xlunzip is a test tool for the lzip decompression code of my lzip patch for
linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux
module as a backend. Xlunzip tests the module for stream, buffer-to-buffer,
and mixed decompression modes, including in-place decompression (using the
same buffer for input and output). You can use xlunzip to verify that the
module produces correct results when decompressing single member files,
multimember files, or the concatenation of two or more compressed files.
Xlunzip can be used with unzcrash to test the robustness of the module to
the decompression of corrupted data.
Note that the in-place decompression of concatenated files can't be
guaranteed to work because an arbitrarily low compression ratio of the
last part of the data can be achieved by appending enough empty
compressed members to a file, masking a high compression ratio at the
beginning of the data.
The distributed index feature of the lzip format allows xlunzip to
decompress concatenated files in place. This can't be guaranteed to work
with formats like gzip or bzip2 because they can't detect whether a high
compression ratio in the first members of the multimember data is being
masked by a low compression ratio in the last members.
The xlunzip tarball contains a copy of the lzip_decompress module and
can be compiled and tested without downloading or applying the patch to
the kernel.
The xlunzip tarball contains a copy of the lzip_decompress module and can be
compiled and tested without downloading or applying the patch to the kernel.
My lzip patch for linux can be found at
http://download.savannah.gnu.org/releases/lzip/kernel/
@ -29,14 +27,72 @@ Lzip related components in the kernel
The lzip_decompress module in lib/lzip_decompress.c provides a versatile
lzip decompression function able to do buffer to buffer decompression or
stream decompression with fill and flush callback functions. The usage
of the function is documented in include/linux/lzip.h.
stream decompression with fill and flush callback functions. The usage of
the function is documented in include/linux/lzip.h.
For decompressing the kernel image, initramfs, and initrd, there is a
wrapper function in lib/decompress_lunzip.c providing the same common
interface as the other decompress_*.c files, which is defined in
include/linux/decompress/generic.h.
Analysis of the in-place decompression
======================================
In order to decompress the kernel in place (using the same buffer for input
and output), the compressed data is placed at the end of the buffer used to
hold the decompressed data. The buffer must be large enough to contain after
the decompressed data extra space for a marker, a trailer, the maximum
possible data expansion, and (if the compressed data consists of more than
one member) N-1 empty members.
|------ compressed data ------|
V V
|----------------|-------------------|---------|
^ ^ extra
|-------- decompressed data ---------|
The input pointer initially points to the beginning of the compressed data
and the output pointer initially points to the beginning of the buffer.
Decompressing compressible data reduces the distance between the pointers,
while decompressing uncompressible data increases the distance. The extra
space must be large enough that the output pointer does not overrun the
input pointer even if all the overlap between compressed and decompressed
data is uncompressible. The worst case is very compressible data followed by
uncompressible data because in this case the output pointer increases faster
when the input pointer is smaller.
| * <-- input pointer
| * , <-- output pointer
| * , '
| x ' <-- overrun (x)
memory | * ,'
address | * ,'
|* ,'
| ,'
| ,'
|,'
`--------------------------
time
All we need to know to calculate the minimum required extra space is:
The maximum expansion ratio.
The size of the last part of a member required to verify integrity.
For multimember data, the overhead per member. (36 bytes for lzip).
The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up
to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space
required to decompress lzip data in place is:
extra_bytes = ( compressed_size >> 6 ) + members * 36
Using the compressed size to calculate the extra_bytes (as in the equation
above) may slightly overestimate the amount of space required in the worst
case. But calculating the extra_bytes from the uncompressed size (as does
linux) is wrong (and inefficient for high compression ratios). The formula
used in arch/x86/boot/header.S
extra_bytes = (uncompressed_size >> 8) + 65536
fails with 1 MB of zeros followed by 8 MB of random data, and wastes memory
for compression ratios > 4:1.
Copyright (C) 2016-2020 Antonio Diaz Diaz.