xlunzip/README

Description

Xlunzip is a test tool for the lzip decompression code of my lzip patch for
linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux
module as a backend. Xlunzip tests the module for stream, buffer-to-buffer,
and mixed decompression modes, including in-place decompression (using the
same buffer for input and output). You can use xlunzip to check that the
module produces correct results when decompressing single member files,
multimember files, or the concatenation of two or more compressed files.
Xlunzip can be used with unzcrash to test the robustness of the module to
the decompression of corrupted data.

The distributed index feature of the lzip format allows xlunzip to
decompress concatenated files in place. This can't be guaranteed to work
with formats like gzip or bzip2 because they can't detect whether a high
compression ratio in the first members of the multimember data is being
masked by a low compression ratio in the last members.

The xlunzip tarball contains a copy of the lzip_decompress module and can be
compiled and tested without downloading or applying the patch to the kernel.

My lzip patch for linux can be found at
http://download.savannah.gnu.org/releases/lzip/kernel/

Lzip related components in the kernel
=====================================

The lzip_decompress module in lib/lzip_decompress.c provides a versatile
lzip decompression function able to do buffer-to-buffer decompression or
stream decompression with fill and flush callback functions. The usage of
the function is documented in include/linux/lzip.h.

For decompressing the kernel image, initramfs, and initrd, there is a
wrapper function in lib/decompress_lunzip.c providing the same common
interface as the other decompress_*.c files, which is defined in
include/linux/decompress/generic.h.

Analysis of the in-place decompression
======================================

In order to decompress the kernel in place (using the same buffer for input
and output), the compressed data is placed at the end of the buffer used to
hold the decompressed data. The buffer must be large enough to contain after
the decompressed data extra space for a marker, a trailer, the maximum
possible data expansion, and (if the compressed data consists of more than
one member) N-1 empty members.

                 |------ compressed data ------|
                 V                             V
|----------------|-------------------|---------|
^                                    ^  extra
|-------- decompressed data ---------|

The input pointer initially points to the beginning of the compressed data
and the output pointer initially points to the beginning of the buffer.
Decompressing compressible data reduces the distance between the pointers,
while decompressing uncompressible data increases the distance. The extra
space must be large enough that the output pointer does not overrun the
input pointer even if all the overlap between compressed and decompressed
data is uncompressible. The worst case is very compressible data followed by
uncompressible data because in this case the output pointer increases faster
when the input pointer is smaller.

        |                        *   <-- input pointer (*)
        |                    *   ,   <-- output pointer (,)
        |                * ,  '
        |            x  '            <-- overrun (x)
memory  |        * ,'
address |    *   ,'
        |*     ,'
        |    ,'
        |  ,'
        |,'
        '--------------------------
                    time

All we need to know to calculate the minimum required extra space is:
  The maximum expansion ratio.
  The size of the last part of a member required to check integrity.
  For multimember data, the overhead per member. (36 bytes for lzip).

The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up
to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space
required to decompress lzip data in place is:

  extra_bytes = ( compressed_size >> 6 ) + members * 36

Using the compressed size to calculate the extra_bytes (as in the formula
above) may slightly overestimate the amount of space required in the worst
case (maximum expansion). But calculating the extra_bytes from the
uncompressed size (as does linux currently) is wrong (and inefficient for
high compression ratios). The formula used in arch/x86/boot/header.S:

  extra_bytes = ( uncompressed_size >> 8 ) + 131072

fails to decompress 1 MB of zeros followed by 8 MB of random data, wastes
memory for compression ratios larger than 4:1, and does not even consider
multimember data.


Copyright (C) 2016-2024 Antonio Diaz Diaz.

This file is free documentation: you have unlimited permission to copy,
distribute, and modify it.

The file Makefile.in is a data file used by configure to produce the Makefile.
It has the same copyright owner and permissions that configure itself.
Adding upstream version 0.3. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:46:39 +01:00			`Description`

Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`Xlunzip is a test tool for the lzip decompression code of my lzip patch for`
			`linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux`
			`module as a backend. Xlunzip tests the module for stream, buffer-to-buffer,`
			`and mixed decompression modes, including in-place decompression (using the`
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`same buffer for input and output). You can use xlunzip to check that the`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`module produces correct results when decompressing single member files,`
			`multimember files, or the concatenation of two or more compressed files.`
			`Xlunzip can be used with unzcrash to test the robustness of the module to`
			`the decompression of corrupted data.`

			`The distributed index feature of the lzip format allows xlunzip to`
			`decompress concatenated files in place. This can't be guaranteed to work`
			`with formats like gzip or bzip2 because they can't detect whether a high`
			`compression ratio in the first members of the multimember data is being`
			`masked by a low compression ratio in the last members.`

			`The xlunzip tarball contains a copy of the lzip_decompress module and can be`
			`compiled and tested without downloading or applying the patch to the kernel.`
Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00
Adding upstream version 0.3. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:46:39 +01:00			`My lzip patch for linux can be found at`
			`http://download.savannah.gnu.org/releases/lzip/kernel/`

			`Lzip related components in the kernel`
			`=====================================`

Merging upstream version 0.5. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:25 +01:00			`The lzip_decompress module in lib/lzip_decompress.c provides a versatile`
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`lzip decompression function able to do buffer-to-buffer decompression or`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`stream decompression with fill and flush callback functions. The usage of`
			`the function is documented in include/linux/lzip.h.`
Adding upstream version 0.3. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:46:39 +01:00
			`For decompressing the kernel image, initramfs, and initrd, there is a`
			`wrapper function in lib/decompress_lunzip.c providing the same common`
			`interface as the other decompress_*.c files, which is defined in`
			`include/linux/decompress/generic.h.`

Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`Analysis of the in-place decompression`
			`======================================`

			`In order to decompress the kernel in place (using the same buffer for input`
			`and output), the compressed data is placed at the end of the buffer used to`
			`hold the decompressed data. The buffer must be large enough to contain after`
			`the decompressed data extra space for a marker, a trailer, the maximum`
			`possible data expansion, and (if the compressed data consists of more than`
			`one member) N-1 empty members.`

			`\|------ compressed data ------\|`
			`V V`
			`\|----------------\|-------------------\|---------\|`
			`^ ^ extra`
			`\|-------- decompressed data ---------\|`

			`The input pointer initially points to the beginning of the compressed data`
			`and the output pointer initially points to the beginning of the buffer.`
			`Decompressing compressible data reduces the distance between the pointers,`
			`while decompressing uncompressible data increases the distance. The extra`
			`space must be large enough that the output pointer does not overrun the`
			`input pointer even if all the overlap between compressed and decompressed`
			`data is uncompressible. The worst case is very compressible data followed by`
			`uncompressible data because in this case the output pointer increases faster`
			`when the input pointer is smaller.`

Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00			`\| * <-- input pointer (*)`
			`\| * , <-- output pointer (,)`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`\| * , '`
			`\| x ' <-- overrun (x)`
			`memory \| * ,'`
			`address \| * ,'`
			`\|* ,'`
			`\| ,'`
			`\| ,'`
			`\|,'`
Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00			`'--------------------------`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`time`

			`All we need to know to calculate the minimum required extra space is:`
			`The maximum expansion ratio.`
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`The size of the last part of a member required to check integrity.`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`For multimember data, the overhead per member. (36 bytes for lzip).`

			`The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up`
			`to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space`
			`required to decompress lzip data in place is:`
Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`extra_bytes = ( compressed_size >> 6 ) + members * 36`

Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00			`Using the compressed size to calculate the extra_bytes (as in the formula`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00			`above) may slightly overestimate the amount of space required in the worst`
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`case (maximum expansion). But calculating the extra_bytes from the`
			`uncompressed size (as does linux currently) is wrong (and inefficient for`
			`high compression ratios). The formula used in arch/x86/boot/header.S:`
Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`extra_bytes = ( uncompressed_size >> 8 ) + 131072`
Merging upstream version 0.7. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:57 +01:00
			`fails to decompress 1 MB of zeros followed by 8 MB of random data, wastes`
			`memory for compression ratios larger than 4:1, and does not even consider`
			`multimember data.`
Merging upstream version 0.6. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:44 +01:00
Adding upstream version 0.3. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:46:39 +01:00
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`Copyright (C) 2016-2024 Antonio Diaz Diaz.`
Adding upstream version 0.3. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:46:39 +01:00
			`This file is free documentation: you have unlimited permission to copy,`
Merging upstream version 0.5. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:51:25 +01:00			`distribute, and modify it.`
Adding upstream version 0.3. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:46:39 +01:00
Merging upstream version 0.8. Signed-off-by: Daniel Baumann <daniel@debian.org> 2025-02-17 22:53:16 +01:00			`The file Makefile.in is a data file used by configure to produce the Makefile.`
			`It has the same copyright owner and permissions that configure itself.`
No results found.