2025-02-17 22:46:39 +01:00
|
|
|
Description
|
|
|
|
|
2025-02-17 22:51:44 +01:00
|
|
|
Xlunzip is a test tool for the lzip decompression code of my lzip patch for
|
|
|
|
linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux
|
|
|
|
module as a backend. Xlunzip tests the module for stream, buffer-to-buffer,
|
|
|
|
and mixed decompression modes, including in-place decompression (using the
|
2025-02-17 22:53:16 +01:00
|
|
|
same buffer for input and output). You can use xlunzip to check that the
|
2025-02-17 22:51:44 +01:00
|
|
|
module produces correct results when decompressing single member files,
|
|
|
|
multimember files, or the concatenation of two or more compressed files.
|
|
|
|
Xlunzip can be used with unzcrash to test the robustness of the module to
|
|
|
|
the decompression of corrupted data.
|
|
|
|
|
|
|
|
The distributed index feature of the lzip format allows xlunzip to
|
|
|
|
decompress concatenated files in place. This can't be guaranteed to work
|
|
|
|
with formats like gzip or bzip2 because they can't detect whether a high
|
|
|
|
compression ratio in the first members of the multimember data is being
|
|
|
|
masked by a low compression ratio in the last members.
|
|
|
|
|
|
|
|
The xlunzip tarball contains a copy of the lzip_decompress module and can be
|
|
|
|
compiled and tested without downloading or applying the patch to the kernel.
|
2025-02-17 22:51:57 +01:00
|
|
|
|
2025-02-17 22:46:39 +01:00
|
|
|
My lzip patch for linux can be found at
|
|
|
|
http://download.savannah.gnu.org/releases/lzip/kernel/
|
|
|
|
|
|
|
|
Lzip related components in the kernel
|
|
|
|
=====================================
|
|
|
|
|
2025-02-17 22:51:25 +01:00
|
|
|
The lzip_decompress module in lib/lzip_decompress.c provides a versatile
|
2025-02-17 22:53:16 +01:00
|
|
|
lzip decompression function able to do buffer-to-buffer decompression or
|
2025-02-17 22:51:44 +01:00
|
|
|
stream decompression with fill and flush callback functions. The usage of
|
|
|
|
the function is documented in include/linux/lzip.h.
|
2025-02-17 22:46:39 +01:00
|
|
|
|
|
|
|
For decompressing the kernel image, initramfs, and initrd, there is a
|
|
|
|
wrapper function in lib/decompress_lunzip.c providing the same common
|
|
|
|
interface as the other decompress_*.c files, which is defined in
|
|
|
|
include/linux/decompress/generic.h.
|
|
|
|
|
2025-02-17 22:51:44 +01:00
|
|
|
Analysis of the in-place decompression
|
|
|
|
======================================
|
|
|
|
|
|
|
|
In order to decompress the kernel in place (using the same buffer for input
|
|
|
|
and output), the compressed data is placed at the end of the buffer used to
|
|
|
|
hold the decompressed data. The buffer must be large enough to contain after
|
|
|
|
the decompressed data extra space for a marker, a trailer, the maximum
|
|
|
|
possible data expansion, and (if the compressed data consists of more than
|
|
|
|
one member) N-1 empty members.
|
|
|
|
|
|
|
|
|------ compressed data ------|
|
|
|
|
V V
|
|
|
|
|----------------|-------------------|---------|
|
|
|
|
^ ^ extra
|
|
|
|
|-------- decompressed data ---------|
|
|
|
|
|
|
|
|
The input pointer initially points to the beginning of the compressed data
|
|
|
|
and the output pointer initially points to the beginning of the buffer.
|
|
|
|
Decompressing compressible data reduces the distance between the pointers,
|
|
|
|
while decompressing uncompressible data increases the distance. The extra
|
|
|
|
space must be large enough that the output pointer does not overrun the
|
|
|
|
input pointer even if all the overlap between compressed and decompressed
|
|
|
|
data is uncompressible. The worst case is very compressible data followed by
|
|
|
|
uncompressible data because in this case the output pointer increases faster
|
|
|
|
when the input pointer is smaller.
|
|
|
|
|
2025-02-17 22:51:57 +01:00
|
|
|
| * <-- input pointer (*)
|
|
|
|
| * , <-- output pointer (,)
|
2025-02-17 22:51:44 +01:00
|
|
|
| * , '
|
|
|
|
| x ' <-- overrun (x)
|
|
|
|
memory | * ,'
|
|
|
|
address | * ,'
|
|
|
|
|* ,'
|
|
|
|
| ,'
|
|
|
|
| ,'
|
|
|
|
|,'
|
2025-02-17 22:51:57 +01:00
|
|
|
'--------------------------
|
2025-02-17 22:51:44 +01:00
|
|
|
time
|
|
|
|
|
|
|
|
All we need to know to calculate the minimum required extra space is:
|
|
|
|
The maximum expansion ratio.
|
2025-02-17 22:53:16 +01:00
|
|
|
The size of the last part of a member required to check integrity.
|
2025-02-17 22:51:44 +01:00
|
|
|
For multimember data, the overhead per member. (36 bytes for lzip).
|
|
|
|
|
|
|
|
The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up
|
|
|
|
to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space
|
|
|
|
required to decompress lzip data in place is:
|
2025-02-17 22:51:57 +01:00
|
|
|
|
2025-02-17 22:51:44 +01:00
|
|
|
extra_bytes = ( compressed_size >> 6 ) + members * 36
|
|
|
|
|
2025-02-17 22:51:57 +01:00
|
|
|
Using the compressed size to calculate the extra_bytes (as in the formula
|
2025-02-17 22:51:44 +01:00
|
|
|
above) may slightly overestimate the amount of space required in the worst
|
2025-02-17 22:53:16 +01:00
|
|
|
case (maximum expansion). But calculating the extra_bytes from the
|
|
|
|
uncompressed size (as does linux currently) is wrong (and inefficient for
|
|
|
|
high compression ratios). The formula used in arch/x86/boot/header.S:
|
2025-02-17 22:51:57 +01:00
|
|
|
|
2025-02-17 22:53:16 +01:00
|
|
|
extra_bytes = ( uncompressed_size >> 8 ) + 131072
|
2025-02-17 22:51:57 +01:00
|
|
|
|
|
|
|
fails to decompress 1 MB of zeros followed by 8 MB of random data, wastes
|
|
|
|
memory for compression ratios larger than 4:1, and does not even consider
|
|
|
|
multimember data.
|
2025-02-17 22:51:44 +01:00
|
|
|
|
2025-02-17 22:46:39 +01:00
|
|
|
|
2025-02-17 22:53:16 +01:00
|
|
|
Copyright (C) 2016-2024 Antonio Diaz Diaz.
|
2025-02-17 22:46:39 +01:00
|
|
|
|
|
|
|
This file is free documentation: you have unlimited permission to copy,
|
2025-02-17 22:51:25 +01:00
|
|
|
distribute, and modify it.
|
2025-02-17 22:46:39 +01:00
|
|
|
|
2025-02-17 22:53:16 +01:00
|
|
|
The file Makefile.in is a data file used by configure to produce the Makefile.
|
|
|
|
It has the same copyright owner and permissions that configure itself.
|