1
0
Fork 0

Adding upstream version 1.9.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-24 04:17:36 +01:00
parent cc73e0fc78
commit e28a4525c4
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
29 changed files with 2003 additions and 1566 deletions

104
ChangeLog
View file

@ -1,15 +1,42 @@
2021-01-03 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.9 released.
* main.cc (main): Report an error if a file name is empty.
Make '-o' behave like '-c', but writing to file instead of stdout.
Make '-c' and '-o' check whether the output is a terminal only once.
Do not open output if input is a terminal.
* main.cc: New option '--check-lib'.
* Replace 'decompressed', 'compressed' with 'out', 'in' in output.
* decompress.cc, dec_stream.cc, dec_stdout.cc:
Continue testing if any input file fails the test.
Show the largest dictionary size in a multimember file.
* main.cc: Show final diagnostic when testing multiple files.
* decompress.cc, dec_stream.cc [LZ_API_VERSION >= 1012]: Avoid
copying decompressed data when testing with lzlib 1.12 or newer.
* compress.cc, dec_stream.cc: Start only the worker threads required.
* dec_stream.cc: Splitter stops reading when trailing data is found.
Don't include trailing data in the compressed size shown.
Use plain comparison instead of Boyer-Moore to search for headers.
* lzip_index.cc: Improve messages for corruption in last header.
* decompress.cc: Shorten messages 'Data error' and 'Unexpected EOF'.
* main.cc: Set a valid invocation_name even if argc == 0.
* Document extraction from tar.lz in manual, '--help', and man page.
* plzip.texi (Introduction): Mention tarlz as an alternative.
* plzip.texi: Several fixes and improvements.
* testsuite: Add 8 new test files.
2019-01-05 Antonio Diaz Diaz <antonio@gnu.org> 2019-01-05 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.8 released. * Version 1.8 released.
* File_* renamed to Lzip_*. * Rename File_* to Lzip_*.
* main.cc: Added new options '--in-slots' and '--out-slots'. * main.cc: New options '--in-slots' and '--out-slots'.
* main.cc: Increased default in_slots per worker from 2 to 4. * main.cc: Increase default in_slots per worker from 2 to 4.
* main.cc: Increased default out_slots per worker from 32 to 64. * main.cc: Increase default out_slots per worker from 32 to 64.
* lzip.h (Lzip_trailer): New function 'verify_consistency'. * lzip.h (Lzip_trailer): New function 'verify_consistency'.
* lzip_index.cc: Detect some kinds of corrupt trailers. * lzip_index.cc: Detect some kinds of corrupt trailers.
* main.cc (main): Check return value of close( infd ). * main.cc (main): Check return value of close( infd ).
* plzip.texi: Improved description of '-0..-9', '-m' and '-s'. * plzip.texi: Improve description of '-0..-9', '-m', and '-s'.
* configure: Added new option '--with-mingw'. * configure: New option '--with-mingw'.
* configure: Accept appending to CXXFLAGS, 'CXXFLAGS+=OPTIONS'. * configure: Accept appending to CXXFLAGS, 'CXXFLAGS+=OPTIONS'.
* INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'. * INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
@ -20,19 +47,19 @@
packet queue by a circular buffer to reduce memory fragmentation. packet queue by a circular buffer to reduce memory fragmentation.
* compress.cc: Return one empty packet at a time to reduce mem use. * compress.cc: Return one empty packet at a time to reduce mem use.
* main.cc: Reduce threads on 32 bit systems to use under 2.22 GiB. * main.cc: Reduce threads on 32 bit systems to use under 2.22 GiB.
* main.cc: Added new option '--loose-trailing'. * main.cc: New option '--loose-trailing'.
* Improved corrupt header detection to HD=3 on seekable files. * Improve corrupt header detection to HD = 3 on seekable files.
(On all files with lzlib 1.10 or newer). (On all files with lzlib 1.10 or newer).
* Replaced 'bits/byte' with inverse compression ratio in output. * Replace 'bits/byte' with inverse compression ratio in output.
* Show progress of decompression at verbosity level 2 (-vv). * Show progress of decompression at verbosity level 2 (-vv).
* Show progress of (de)compression only if stderr is a terminal. * Show progress of (de)compression only if stderr is a terminal.
* main.cc: Do not add a second .lz extension to the arg of -o. * main.cc: Do not add a second .lz extension to the arg of -o.
* Show dictionary size at verbosity level 4 (-vvvv). * Show dictionary size at verbosity level 4 (-vvvv).
* main.cc (cleanup_and_fail): Suppress messages from other threads. * main.cc (cleanup_and_fail): Suppress messages from other threads.
* list.cc: Added missing '#include <pthread.h>'. * list.cc: Add missing '#include <pthread.h>'.
* plzip.texi: Added chapter 'Output'. * plzip.texi: New chapter 'Output'.
* plzip.texi (Memory requirements): Added table. * plzip.texi (Memory requirements): Add table.
* plzip.texi (Program design): Added a block diagram. * plzip.texi (Program design): Add a block diagram.
2017-04-12 Antonio Diaz Diaz <antonio@gnu.org> 2017-04-12 Antonio Diaz Diaz <antonio@gnu.org>
@ -41,15 +68,15 @@
* Don't allow mixing different operations (-d, -l or -t). * Don't allow mixing different operations (-d, -l or -t).
* main.cc: Continue testing if any input file is a terminal. * main.cc: Continue testing if any input file is a terminal.
* lzip_index.cc: Improve detection of bad dict and trailing data. * lzip_index.cc: Improve detection of bad dict and trailing data.
* lzip.h: Unified messages for bad magic, trailing data, etc. * lzip.h: Unify messages for bad magic, trailing data, etc.
2016-05-14 Antonio Diaz Diaz <antonio@gnu.org> 2016-05-14 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.5 released. * Version 1.5 released.
* main.cc: Added new option '-a, --trailing-error'. * main.cc: New option '-a, --trailing-error'.
* main.cc (main): Delete '--output' file if infd is a terminal. * main.cc (main): Delete '--output' file if infd is a terminal.
* main.cc (main): Don't use stdin more than once. * main.cc (main): Don't use stdin more than once.
* plzip.texi: Added chapters 'Trailing data' and 'Examples'. * plzip.texi: New chapters 'Trailing data' and 'Examples'.
* configure: Avoid warning on some shells when testing for g++. * configure: Avoid warning on some shells when testing for g++.
* Makefile.in: Detect the existence of install-info. * Makefile.in: Detect the existence of install-info.
* check.sh: A POSIX shell is required to run the tests. * check.sh: A POSIX shell is required to run the tests.
@ -65,20 +92,20 @@
* Version 1.3 released. * Version 1.3 released.
* dec_stream.cc: Don't use output packets or muxer when testing. * dec_stream.cc: Don't use output packets or muxer when testing.
* Make '-dvvv' and '-tvvv' show dictionary size like lzip. * Make '-dvvv' and '-tvvv' show dictionary size like lzip.
* lzip.h: Added missing 'const' to the declaration of 'compress'. * lzip.h: Add missing 'const' to the declaration of 'compress'.
* plzip.texi: Added chapters 'Memory requirements' and * plzip.texi: New chapters 'Memory requirements' and
'Minimum file sizes'. 'Minimum file sizes'.
* Makefile.in: Added new targets 'install*-compress'. * Makefile.in: New targets 'install*-compress'.
2014-08-29 Antonio Diaz Diaz <antonio@gnu.org> 2014-08-29 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.2 released. * Version 1.2 released.
* main.cc (close_and_set_permissions): Behave like 'cp -p'. * main.cc (close_and_set_permissions): Behave like 'cp -p'.
* dec_stdout.cc dec_stream.cc: Make 'slot_av' a vector to limit * dec_stdout.cc, dec_stream.cc: Make 'slot_av' a vector to limit
the number of packets produced by each worker individually. the number of packets produced by each worker individually.
* plzip.texinfo: Renamed to plzip.texi. * plzip.texinfo: Rename to plzip.texi.
* plzip.texi: Documented the approximate amount of memory required. * plzip.texi: Document the approximate amount of memory required.
* License changed to GPL version 2 or later. * Change license to GPL version 2 or later.
2013-09-17 Antonio Diaz Diaz <antonio@gnu.org> 2013-09-17 Antonio Diaz Diaz <antonio@gnu.org>
@ -89,14 +116,13 @@
2013-05-29 Antonio Diaz Diaz <antonio@gnu.org> 2013-05-29 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.0 released. * Version 1.0 released.
* compress.cc: 'deliver_packet' changed to 'deliver_packets'. * compress.cc: Change 'deliver_packet' to 'deliver_packets'.
* Scalability of decompression from/to regular files has been * Scalability of decompression from/to regular files has been
increased by removing splitter and muxer when not needed. increased by removing splitter and muxer when not needed.
* The number of worker threads is now limited to the number of * The number of worker threads is now limited to the number of
members when decompressing from a regular file. members when decompressing from a regular file.
* configure: Options now accept a separate argument. * configure: Options now accept a separate argument.
* Makefile.in: Added new target 'install-as-lzip'. * Makefile.in: New targets 'install-as-lzip' and 'install-bin'.
* Makefile.in: Added new target 'install-bin'.
* main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2. * main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2.
* main.cc: Define 'strtoull' to 'std::strtoul' on Windows. * main.cc: Define 'strtoull' to 'std::strtoul' on Windows.
@ -104,17 +130,17 @@
* Version 0.9 released. * Version 0.9 released.
* Minor fixes and cleanups. * Minor fixes and cleanups.
* configure: 'datadir' renamed to 'datarootdir'. * configure: Rename 'datadir' to 'datarootdir'.
2012-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es> 2012-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 0.8 released. * Version 0.8 released.
* main.cc: Added new option '-F, --recompress'. * main.cc: New option '-F, --recompress'.
* decompress.cc (decompress): Show compression ratio. * decompress.cc (decompress): Show compression ratio.
* main.cc (close_and_set_permissions): Inability to change output * main.cc (close_and_set_permissions): Inability to change output
file attributes has been downgraded from error to warning. file attributes has been downgraded from error to warning.
* Small change in '--help' output and man page. * Small change in '--help' output and man page.
* Changed quote characters in messages as advised by GNU Standards. * Change quote characters in messages as advised by GNU Standards.
* main.cc: Set stdin/stdout in binary mode on OS2. * main.cc: Set stdin/stdout in binary mode on OS2.
* compress.cc: Reduce memory use of compressed packets. * compress.cc: Reduce memory use of compressed packets.
* decompress.cc: Use Boyer-Moore algorithm to search for headers. * decompress.cc: Use Boyer-Moore algorithm to search for headers.
@ -128,15 +154,16 @@
produced by workers to limit the amount of memory used. produced by workers to limit the amount of memory used.
* main.cc (open_instream): Don't show the message * main.cc (open_instream): Don't show the message
" and '--stdout' was not specified" for directories, etc. " and '--stdout' was not specified" for directories, etc.
* main.cc: Fixed warning about fchown return value being ignored. Exit with status 1 if any output file exists and is skipped.
* testsuite: 'test1' renamed to 'test.txt'. Added new tests. * main.cc: Fix warning about fchown return value being ignored.
* testsuite: Rename 'test1' to 'test.txt'. New tests.
2010-03-20 Antonio Diaz Diaz <ant_diaz@teleline.es> 2010-03-20 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 0.6 released. * Version 0.6 released.
* Small portability fixes. * Small portability fixes.
* plzip.texinfo: Added chapter 'Program Design' and description * plzip.texinfo: New chapter 'Program Design'.
of option '--threads'. Add missing description of option '-n, --threads'.
* Debug stats have been fixed. * Debug stats have been fixed.
2010-02-10 Antonio Diaz Diaz <ant_diaz@teleline.es> 2010-02-10 Antonio Diaz Diaz <ant_diaz@teleline.es>
@ -154,7 +181,7 @@
2010-01-24 Antonio Diaz Diaz <ant_diaz@teleline.es> 2010-01-24 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 0.3 released. * Version 0.3 released.
* Implemented option '--data-size'. * New option '-B, --data-size'.
* Output file is now removed if plzip is interrupted. * Output file is now removed if plzip is interrupted.
* This version automatically chooses the smallest possible * This version automatically chooses the smallest possible
dictionary size for each member during compression, saving dictionary size for each member during compression, saving
@ -164,15 +191,14 @@
2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es> 2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 0.2 released. * Version 0.2 released.
* Implemented option '--dictionary-size'. * New options '-s, --dictionary-size' and '-m, --match-length'.
* Implemented option '--match-length'.
* 'lacos_rbtree' has been replaced with a circular buffer. * 'lacos_rbtree' has been replaced with a circular buffer.
2009-12-05 Antonio Diaz Diaz <ant_diaz@teleline.es> 2009-12-05 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 0.1 released. * Version 0.1 released.
* This version is based on llzip-0.03 (2009-11-21), written by * This version is based on llzip-0.03 (2009-11-21), written by
Laszlo Ersek <lacos@caesar.elte.hu>. Laszlo Ersek <lacos@caesar.elte.hu>. Thanks Laszlo!
From llzip-0.03/README: From llzip-0.03/README:
llzip is a hack on my lbzip2-0.17 release. I ripped out the llzip is a hack on my lbzip2-0.17 release. I ripped out the
@ -184,8 +210,8 @@
until something better appears on the net. until something better appears on the net.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This file is a collection of facts, and thus it is not copyrightable, This file is a collection of facts, and thus it is not copyrightable,
but just in case, you have unlimited permission to copy, distribute and but just in case, you have unlimited permission to copy, distribute, and
modify it. modify it.

37
INSTALL
View file

@ -1,11 +1,15 @@
Requirements Requirements
------------ ------------
You will need a C++ compiler and the lzlib compression library installed. You will need a C++11 compiler and the compression library lzlib installed.
I use gcc 5.3.0 and 4.1.2, but the code should compile with any standards (gcc 3.3.6 or newer is recommended).
I use gcc 6.1.0 and 4.1.2, but the code should compile with any standards
compliant compiler. compliant compiler.
Lzlib must be version 1.0 or newer, but the fast encoder is only available
in lzlib 1.7 or newer, and the HD = 3 detection of corrupt headers on Lzlib must be version 1.0 or newer, but the fast encoder requires lzlib 1.7
non-seekable multimember files is only available in lzlib 1.10 or newer. or newer, the Hamming distance (HD) = 3 detection of corrupt headers in
non-seekable multimember files requires lzlib 1.10 or newer, and the 'no
copy' optimization for testing requires lzlib 1.12 or newer.
Gcc is available at http://gcc.gnu.org. Gcc is available at http://gcc.gnu.org.
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html. Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
@ -33,7 +37,10 @@ the main archive.
To link against a lzlib not installed in a standard place, use: To link against a lzlib not installed in a standard place, use:
./configure CPPFLAGS='-I<dir_of_lzlib.h>' LDFLAGS='-L<dir_of_liblz.a>' ./configure CPPFLAGS='-I <includedir>' LDFLAGS='-L <libdir>'
(Replace <includedir> with the directory containing the file lzlib.h,
and <libdir> with the directory containing the file liblz.a).
If you are compiling on MinGW, use --with-mingw (note that the Windows If you are compiling on MinGW, use --with-mingw (note that the Windows
I/O functions used with MinGW are not guaranteed to be thread safe): I/O functions used with MinGW are not guaranteed to be thread safe):
@ -50,11 +57,11 @@ the main archive.
documentation. documentation.
Or type 'make install-compress', which additionally compresses the Or type 'make install-compress', which additionally compresses the
info manual and the man page after installation. (Installing info manual and the man page after installation.
compressed docs may become the default in the future). (Installing compressed docs may become the default in the future).
You can install only the program, the info manual or the man page by You can install only the program, the info manual, or the man page by
typing 'make install-bin', 'make install-info' or 'make install-man' typing 'make install-bin', 'make install-info', or 'make install-man'
respectively. respectively.
Instead of 'make install', you can type 'make install-as-lzip' to Instead of 'make install', you can type 'make install-as-lzip' to
@ -65,10 +72,10 @@ the main archive.
Another way Another way
----------- -----------
You can also compile plzip into a separate directory. You can also compile plzip into a separate directory.
To do this, you must use a version of 'make' that supports the 'VPATH' To do this, you must use a version of 'make' that supports the variable
variable, such as GNU 'make'. 'cd' to the directory where you want the 'VPATH', such as GNU 'make'. 'cd' to the directory where you want the
object files and executables to go and run the 'configure' script. object files and executables to go and run the 'configure' script.
'configure' automatically checks for the source code in '.', in '..' and 'configure' automatically checks for the source code in '.', in '..', and
in the directory that 'configure' is in. in the directory that 'configure' is in.
'configure' recognizes the option '--srcdir=DIR' to control where to 'configure' recognizes the option '--srcdir=DIR' to control where to
@ -79,7 +86,7 @@ After running 'configure', you can run 'make' and 'make install' as
explained above. explained above.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy, This file is free documentation: you have unlimited permission to copy,
distribute and modify it. distribute, and modify it.

View file

@ -129,7 +129,9 @@ dist : doc
$(DISTNAME)/*.cc \ $(DISTNAME)/*.cc \
$(DISTNAME)/testsuite/check.sh \ $(DISTNAME)/testsuite/check.sh \
$(DISTNAME)/testsuite/test.txt \ $(DISTNAME)/testsuite/test.txt \
$(DISTNAME)/testsuite/test.txt.lz $(DISTNAME)/testsuite/fox_*.lz \
$(DISTNAME)/testsuite/test.txt.lz \
$(DISTNAME)/testsuite/test_em.txt.lz
rm -f $(DISTNAME) rm -f $(DISTNAME)
lzip -v -9 $(DISTNAME).tar lzip -v -9 $(DISTNAME).tar

71
NEWS
View file

@ -1,31 +1,58 @@
Changes in version 1.8: Changes in version 1.9:
The new options '--in-slots' and '--out-slots', setting the number of input Plzip now reports an error if a file name is empty (plzip -t "").
and output packets buffered during streamed decompression, have been added.
Increasing the number of packets may increase decompression speed, but
requires more memory.
The default number of input packets buffered per worker thread when Option '-o, --output' now behaves like '-c, --stdout', but sending the
decompressing from non-seekable input has been increased from 2 to 4. output unconditionally to a file instead of to standard output. See the new
description of '-o' in the manual. This change is backwards compatible only
when (de)compressing from standard input alone. Therefore commands like:
plzip -o foo.lz - bar < foo
must now be split into:
plzip -o foo.lz - < foo
plzip bar
or rewritten as:
plzip - bar < foo > foo.lz
The default number of output packets buffered per worker thread when When using '-c' or '-o', plzip now checks whether the output is a terminal
decompressing to non-seekable output has been increased from 32 to 64. only once.
Detection of forbidden combinations of characters in trailing data has been Plzip now does not even open the output file if the input file is a terminal.
improved.
Errors are now also checked when closing the input file. The new option '--check-lib', which compares the version of lzlib used to
compile plzip with the version actually being used at run time, has been added.
The descriptions of '-0..-9', '-m' and '-s' in the manual have been The words 'decompressed' and 'compressed' have been replaced with the
improved. shorter 'out' and 'in' in the verbose output when decompressing or testing.
The configure script now accepts the option '--with-mingw' to enable the When checking the integrity of multiple files, plzip is now able to continue
compilation of plzip under MS Windows (with the MinGW compiler). Use with checking the rest of the files (instead of exiting) if some of them fail the
care. The Windows I/O functions used are not guaranteed to be thread safe. test, allowing 'plzip --test' to show a final diagnostic with the number of
(Code based on a patch by Hannes Domani). files that failed (just as 'lzip --test').
The configure script now accepts appending options to CXXFLAGS using the Testing is now slightly (1.6%) faster when using lzlib 1.12.
syntax 'CXXFLAGS+=OPTIONS'.
It has been documented in INSTALL the use of When compressing, or when decompressing or testing from a non-seekable file
CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO' when compiling on MinGW. or from standard input, plzip now starts only the number of worker threads
required.
When decompressing or testing from a non-seekable file or from standard
input, trailing data are now not counted in the compressed size shown.
When decompressing or testing a multimember file, plzip now shows the
largest dictionary size of all members in the file instead of showing the
dictionary size of the first member.
Option '--list' now reports corruption or truncation of the last header in a
multimenber file specifically instead of showing the generic message "Last
member in input file is truncated or corrupt."
The error messages for 'Data error' and 'Unexpected EOF' have been shortened.
The commands needed to extract files from a tar.lz archive have been
documented in the manual, in the output of '--help', and in the man page.
Tarlz is mentioned in the manual as an alternative to tar + plzip.
Several fixes and improvements have been made to the manual.
8 new test files have been added to the testsuite.

98
README
View file

@ -1,30 +1,36 @@
Description Description
Plzip is a massively parallel (multi-threaded) implementation of lzip, fully Plzip is a massively parallel (multi-threaded) implementation of lzip, fully
compatible with lzip 1.4 or newer. Plzip uses the lzlib compression library. compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
Lzip is a lossless data compressor with a user interface similar to the Lzip is a lossless data compressor with a user interface similar to the one
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0) of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
or compress most files more than bzip2 (lzip -9). Decompression speed is chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 interoperability. Lzip can compress about as fast as gzip (lzip -0) or
from a data recovery perspective. Lzip has been designed, written and compress most files more than bzip2 (lzip -9). Decompression speed is
tested with great care to replace gzip and bzip2 as the standard intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
general-purpose compressed format for unix-like systems. a data recovery perspective. Lzip has been designed, written, and tested
with great care to replace gzip and bzip2 as the standard general-purpose
compressed format for unix-like systems.
Plzip can compress/decompress large files on multiprocessor machines Plzip can compress/decompress large files on multiprocessor machines much
much faster than lzip, at the cost of a slightly reduced compression faster than lzip, at the cost of a slightly reduced compression ratio (0.4
ratio (0.4 to 2 percent larger compressed files). Note that the number to 2 percent larger compressed files). Note that the number of usable
of usable threads is limited by file size; on files larger than a few GB threads is limited by file size; on files larger than a few GB plzip can use
plzip can use hundreds of processors, but on files of only a few MB hundreds of processors, but on files of only a few MB plzip is no faster
plzip is no faster than lzip. than lzip.
When compressing, plzip divides the input file into chunks and For creation and manipulation of compressed tar archives tarlz can be more
compresses as many chunks simultaneously as worker threads are chosen, efficient than using tar and plzip because tarlz is able to keep the
creating a multimember compressed file. alignment between tar members and lzip members.
When compressing, plzip divides the input file into chunks and compresses as
many chunks simultaneously as worker threads are chosen, creating a
multimember compressed file.
When decompressing, plzip decompresses as many members simultaneously as When decompressing, plzip decompresses as many members simultaneously as
worker threads are chosen. Files that were compressed with lzip will not worker threads are chosen. Files that were compressed with lzip will not
be decompressed faster than using lzip (unless the '-b' option was used) be decompressed faster than using lzip (unless the option '-b' was used)
because lzip usually produces single-member files, which can't be because lzip usually produces single-member files, which can't be
decompressed in parallel. decompressed in parallel.
@ -32,34 +38,34 @@ The lzip file format is designed for data sharing and long-term archiving,
taking into account both data integrity and decoder availability: taking into account both data integrity and decoder availability:
* The lzip format provides very safe integrity checking and some data * The lzip format provides very safe integrity checking and some data
recovery means. The lziprecover program can repair bit flip errors recovery means. The program lziprecover can repair bit flip errors
(one of the most common forms of data corruption) in lzip files, (one of the most common forms of data corruption) in lzip files, and
and provides data recovery capabilities, including error-checked provides data recovery capabilities, including error-checked merging
merging of damaged copies of a file. of damaged copies of a file.
* The lzip format is as simple as possible (but not simpler). The * The lzip format is as simple as possible (but not simpler). The lzip
lzip manual provides the source code of a simple decompressor manual provides the source code of a simple decompressor along with a
along with a detailed explanation of how it works, so that with detailed explanation of how it works, so that with the only help of the
the only help of the lzip manual it would be possible for a lzip manual it would be possible for a digital archaeologist to extract
digital archaeologist to extract the data from a lzip file long the data from a lzip file long after quantum computers eventually
after quantum computers eventually render LZMA obsolete. render LZMA obsolete.
* Additionally the lzip reference implementation is copylefted, which * Additionally the lzip reference implementation is copylefted, which
guarantees that it will remain free forever. guarantees that it will remain free forever.
A nice feature of the lzip format is that a corrupt byte is easier to A nice feature of the lzip format is that a corrupt byte is easier to repair
repair the nearer it is from the beginning of the file. Therefore, with the nearer it is from the beginning of the file. Therefore, with the help of
the help of lziprecover, losing an entire archive just because of a lziprecover, losing an entire archive just because of a corrupt byte near
corrupt byte near the beginning is a thing of the past. the beginning is a thing of the past.
Plzip uses the same well-defined exit status values used by lzip, which Plzip uses the same well-defined exit status values used by lzip, which
makes it safer than compressors returning ambiguous warning values (like makes it safer than compressors returning ambiguous warning values (like
gzip) when it is used as a back end for other programs like tar or zutils. gzip) when it is used as a back end for other programs like tar or zutils.
Plzip will automatically use for each file the largest dictionary size Plzip will automatically use for each file the largest dictionary size that
that does not exceed neither the file size nor the limit given. Keep in does not exceed neither the file size nor the limit given. Keep in mind that
mind that the decompression memory requirement is affected at the decompression memory requirement is affected at compression time by the
compression time by the choice of dictionary size limit. choice of dictionary size limit.
When compressing, plzip replaces every file given in the command line When compressing, plzip replaces every file given in the command line
with a compressed version of itself, with the name "original_name.lz". with a compressed version of itself, with the name "original_name.lz".
@ -76,28 +82,28 @@ possible, ownership of the file just as 'cp -p' does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared). S_ISGID are cleared).
Plzip is able to read from some types of non regular files if the Plzip is able to read from some types of non-regular files if either the
'--stdout' option is specified. option '-c' or the option '-o' is specified.
If no file names are specified, plzip compresses (or decompresses) from If no file names are specified, plzip compresses (or decompresses) from
standard input to standard output. In this case, plzip will decline to standard input to standard output. Plzip will refuse to read compressed data
write compressed output to a terminal, as this would be entirely from a terminal or write compressed data to a terminal, as this would be
incomprehensible and therefore pointless. entirely incomprehensible and might leave the terminal in an abnormal state.
Plzip will correctly decompress a file which is the concatenation of two or Plzip will correctly decompress a file which is the concatenation of two or
more compressed files. The result is the concatenation of the corresponding more compressed files. The result is the concatenation of the corresponding
decompressed files. Integrity testing of concatenated compressed files is decompressed files. Integrity testing of concatenated compressed files is
also supported. also supported.
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
have been compressed. Decompressed is used to refer to data which have been compressed. Decompressed is used to refer to data which have undergone
undergone the process of decompression. the process of decompression.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy, This file is free documentation: you have unlimited permission to copy,
distribute and modify it. distribute, and modify it.
The file Makefile.in is a data file used by configure to produce the The file Makefile.in is a data file used by configure to produce the
Makefile. It has the same copyright owner and permissions that configure Makefile. It has the same copyright owner and permissions that configure

View file

@ -1,15 +1,15 @@
/* Arg_parser - POSIX/GNU command line argument parser. (C++ version) /* Arg_parser - POSIX/GNU command line argument parser. (C++ version)
Copyright (C) 2006-2019 Antonio Diaz Diaz. Copyright (C) 2006-2021 Antonio Diaz Diaz.
This library is free software. Redistribution and use in source and This library is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided binary forms, with or without modification, are permitted provided
that the following conditions are met: that the following conditions are met:
1. Redistributions of source code must retain the above copyright 1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer. notice, this list of conditions, and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright 2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the notice, this list of conditions, and the following disclaimer in the
documentation and/or other materials provided with the distribution. documentation and/or other materials provided with the distribution.
This library is distributed in the hope that it will be useful, This library is distributed in the hope that it will be useful,
@ -167,7 +167,7 @@ Arg_parser::Arg_parser( const int argc, const char * const argv[],
else non_options.push_back( argv[argind++] ); else non_options.push_back( argv[argind++] );
} }
} }
if( error_.size() ) data.clear(); if( !error_.empty() ) data.clear();
else else
{ {
for( unsigned i = 0; i < non_options.size(); ++i ) for( unsigned i = 0; i < non_options.size(); ++i )
@ -190,7 +190,7 @@ Arg_parser::Arg_parser( const char * const opt, const char * const arg,
{ if( opt[2] ) parse_long_option( opt, arg, options, argind ); } { if( opt[2] ) parse_long_option( opt, arg, options, argind ); }
else else
parse_short_option( opt, arg, options, argind ); parse_short_option( opt, arg, options, argind );
if( error_.size() ) data.clear(); if( !error_.empty() ) data.clear();
} }
else data.push_back( Record( opt ) ); else data.push_back( Record( opt ) );
} }

View file

@ -1,15 +1,15 @@
/* Arg_parser - POSIX/GNU command line argument parser. (C++ version) /* Arg_parser - POSIX/GNU command line argument parser. (C++ version)
Copyright (C) 2006-2019 Antonio Diaz Diaz. Copyright (C) 2006-2021 Antonio Diaz Diaz.
This library is free software. Redistribution and use in source and This library is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided binary forms, with or without modification, are permitted provided
that the following conditions are met: that the following conditions are met:
1. Redistributions of source code must retain the above copyright 1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer. notice, this list of conditions, and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright 2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the notice, this list of conditions, and the following disclaimer in the
documentation and/or other materials provided with the distribution. documentation and/or other materials provided with the distribution.
This library is distributed in the hope that it will be useful, This library is distributed in the hope that it will be useful,
@ -18,7 +18,7 @@
*/ */
/* Arg_parser reads the arguments in 'argv' and creates a number of /* Arg_parser reads the arguments in 'argv' and creates a number of
option codes, option arguments and non-option arguments. option codes, option arguments, and non-option arguments.
In case of error, 'error' returns a non-empty error message. In case of error, 'error' returns a non-empty error message.
@ -61,6 +61,7 @@ private:
explicit Record( const char * const arg ) : code( 0 ), argument( arg ) {} explicit Record( const char * const arg ) : code( 0 ), argument( arg ) {}
}; };
const std::string empty_arg;
std::string error_; std::string error_;
std::vector< Record > data; std::vector< Record > data;
@ -73,17 +74,17 @@ public:
Arg_parser( const int argc, const char * const argv[], Arg_parser( const int argc, const char * const argv[],
const Option options[], const bool in_order = false ); const Option options[], const bool in_order = false );
// Restricted constructor. Parses a single token and argument (if any) // Restricted constructor. Parses a single token and argument (if any).
Arg_parser( const char * const opt, const char * const arg, Arg_parser( const char * const opt, const char * const arg,
const Option options[] ); const Option options[] );
const std::string & error() const { return error_; } const std::string & error() const { return error_; }
// The number of arguments parsed (may be different from argc) // The number of arguments parsed. May be different from argc.
int arguments() const { return data.size(); } int arguments() const { return data.size(); }
// If code( i ) is 0, argument( i ) is a non-option. /* If code( i ) is 0, argument( i ) is a non-option.
// Else argument( i ) is the option's argument (or empty). Else argument( i ) is the option's argument (or empty). */
int code( const int i ) const int code( const int i ) const
{ {
if( i >= 0 && i < arguments() ) return data[i].code; if( i >= 0 && i < arguments() ) return data[i].code;
@ -93,6 +94,6 @@ public:
const std::string & argument( const int i ) const const std::string & argument( const int i ) const
{ {
if( i >= 0 && i < arguments() ) return data[i].argument; if( i >= 0 && i < arguments() ) return data[i].argument;
else return error_; else return empty_arg;
} }
}; };

View file

@ -1,6 +1,6 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009 Laszlo Ersek.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -27,7 +27,6 @@
#include <cstring> #include <cstring>
#include <string> #include <string>
#include <vector> #include <vector>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
#include <lzlib.h> #include <lzlib.h>
@ -39,9 +38,9 @@
#endif #endif
// Returns the number of bytes really read. /* Returns the number of bytes really read.
// If (returned value < size) and (errno == 0), means EOF was reached. If (returned value < size) and (errno == 0), means EOF was reached.
// */
int readblock( const int fd, uint8_t * const buf, const int size ) int readblock( const int fd, uint8_t * const buf, const int size )
{ {
int sz = 0; int sz = 0;
@ -58,9 +57,9 @@ int readblock( const int fd, uint8_t * const buf, const int size )
} }
// Returns the number of bytes really written. /* Returns the number of bytes really written.
// If (returned value < size), it is always an error. If (returned value < size), it is always an error.
// */
int writeblock( const int fd, const uint8_t * const buf, const int size ) int writeblock( const int fd, const uint8_t * const buf, const int size )
{ {
int sz = 0; int sz = 0;
@ -150,7 +149,7 @@ namespace {
unsigned long long in_size = 0; unsigned long long in_size = 0;
unsigned long long out_size = 0; unsigned long long out_size = 0;
const char * const mem_msg = "Not enough memory. Try a smaller dictionary size"; const char * const mem_msg2 = "Not enough memory. Try a smaller dictionary size.";
struct Packet // data block with a serial number struct Packet // data block with a serial number
@ -235,8 +234,7 @@ public:
xunlock( &imutex ); xunlock( &imutex );
if( !ipacket ) // EOF if( !ipacket ) // EOF
{ {
// notify muxer when last worker exits xlock( &omutex ); // notify muxer when last worker exits
xlock( &omutex );
if( --num_working == 0 ) xsignal( &oav_or_exit ); if( --num_working == 0 ) xsignal( &oav_or_exit );
xunlock( &omutex ); xunlock( &omutex );
} }
@ -284,12 +282,16 @@ public:
void return_empty_packet() // return a slot to the tally void return_empty_packet() // return a slot to the tally
{ slot_tally.leave_slot(); } { slot_tally.leave_slot(); }
void finish() // splitter has no more packets to send void finish( const int workers_spared )
{ {
xlock( &imutex ); xlock( &imutex ); // splitter has no more packets to send
eof = true; eof = true;
xbroadcast( &iav_or_eof ); xbroadcast( &iav_or_eof );
xunlock( &imutex ); xunlock( &imutex );
xlock( &omutex ); // notify muxer if all workers have exited
num_working -= workers_spared;
if( num_working <= 0 ) xsignal( &oav_or_exit );
xunlock( &omutex );
} }
bool finished() // all packets delivered to muxer bool finished() // all packets delivered to muxer
@ -303,52 +305,6 @@ public:
}; };
struct Splitter_arg
{
Packet_courier * courier;
const Pretty_print * pp;
int infd;
int data_size;
int offset;
};
// split data from input file into chunks and pass them to
// courier for packaging and distribution to workers.
extern "C" void * csplitter( void * arg )
{
const Splitter_arg & tmp = *(const Splitter_arg *)arg;
Packet_courier & courier = *tmp.courier;
const Pretty_print & pp = *tmp.pp;
const int infd = tmp.infd;
const int data_size = tmp.data_size;
const int offset = tmp.offset;
for( bool first_post = true; ; first_post = false )
{
uint8_t * const data = new( std::nothrow ) uint8_t[offset+data_size];
if( !data ) { pp( mem_msg ); cleanup_and_fail(); }
const int size = readblock( infd, data + offset, data_size );
if( size != data_size && errno )
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
if( size > 0 || first_post ) // first packet may be empty
{
in_size += size;
courier.receive_packet( data, size );
if( size < data_size ) break; // EOF
}
else
{
delete[] data;
break;
}
}
courier.finish(); // no more packets to send
return 0;
}
struct Worker_arg struct Worker_arg
{ {
Packet_courier * courier; Packet_courier * courier;
@ -358,9 +314,18 @@ struct Worker_arg
int offset; int offset;
}; };
struct Splitter_arg
{
struct Worker_arg worker_arg;
pthread_t * worker_threads;
int infd;
int data_size;
int num_workers; // returned by splitter to main thread
};
// get packets from courier, replace their contents, and return
// them to courier. /* Get packets from courier, replace their contents, and return them to
courier. */
extern "C" void * cworker( void * arg ) extern "C" void * cworker( void * arg )
{ {
const Worker_arg & tmp = *(const Worker_arg *)arg; const Worker_arg & tmp = *(const Worker_arg *)arg;
@ -386,7 +351,7 @@ extern "C" void * cworker( void * arg )
if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) if( !encoder || LZ_compress_errno( encoder ) != LZ_ok )
{ {
if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error ) if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error )
pp( mem_msg ); pp( mem_msg2 );
else else
internal_error( "invalid argument to encoder." ); internal_error( "invalid argument to encoder." );
cleanup_and_fail(); cleanup_and_fail();
@ -435,8 +400,57 @@ extern "C" void * cworker( void * arg )
} }
// get from courier the processed and sorted packets, and write /* Split data from input file into chunks and pass them to courier for
// their contents to the output file. packaging and distribution to workers.
Start a worker per packet up to a maximum of num_workers.
*/
extern "C" void * csplitter( void * arg )
{
Splitter_arg & tmp = *(Splitter_arg *)arg;
Packet_courier & courier = *tmp.worker_arg.courier;
const Pretty_print & pp = *tmp.worker_arg.pp;
pthread_t * const worker_threads = tmp.worker_threads;
const int offset = tmp.worker_arg.offset;
const int infd = tmp.infd;
const int data_size = tmp.data_size;
int i = 0; // number of workers started
for( bool first_post = true; ; first_post = false )
{
uint8_t * const data = new( std::nothrow ) uint8_t[offset+data_size];
if( !data ) { pp( mem_msg2 ); cleanup_and_fail(); }
const int size = readblock( infd, data + offset, data_size );
if( size != data_size && errno )
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
if( size > 0 || first_post ) // first packet may be empty
{
in_size += size;
courier.receive_packet( data, size );
if( i < tmp.num_workers ) // start a new worker
{
const int errcode =
pthread_create( &worker_threads[i++], 0, cworker, &tmp.worker_arg );
if( errcode ) { show_error( "Can't create worker threads", errcode );
cleanup_and_fail(); }
}
if( size < data_size ) break; // EOF
}
else
{
delete[] data;
break;
}
}
courier.finish( tmp.num_workers - i ); // no more packets to send
tmp.num_workers = i;
return 0;
}
/* Get from courier the processed and sorted packets, and write their
contents to the output file.
*/
void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
{ {
std::vector< const Packet * > packet_vector; std::vector< const Packet * > packet_vector;
@ -450,8 +464,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
const Packet * const opacket = packet_vector[i]; const Packet * const opacket = packet_vector[i];
out_size += opacket->size; out_size += opacket->size;
const int wr = writeblock( outfd, opacket->data, opacket->size ); if( writeblock( outfd, opacket->data, opacket->size ) != opacket->size )
if( wr != opacket->size )
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); } { pp(); show_error( "Write error", errno ); cleanup_and_fail(); }
delete[] opacket->data; delete[] opacket->data;
courier.return_empty_packet(); courier.return_empty_packet();
@ -462,8 +475,8 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
} // end namespace } // end namespace
// init the courier, then start the splitter and the workers and /* Init the courier, then start the splitter and the workers and call the
// call the muxer. muxer. */
int compress( const unsigned long long cfile_size, int compress( const unsigned long long cfile_size,
const int data_size, const int dictionary_size, const int data_size, const int dictionary_size,
const int match_len_limit, const int num_workers, const int match_len_limit, const int num_workers,
@ -478,50 +491,44 @@ int compress( const unsigned long long cfile_size,
out_size = 0; out_size = 0;
Packet_courier courier( num_workers, num_slots ); Packet_courier courier( num_workers, num_slots );
if( debug_level & 2 ) std::fputs( "compress.\n", stderr );
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
if( !worker_threads ) { pp( mem_msg ); return 1; }
Splitter_arg splitter_arg; Splitter_arg splitter_arg;
splitter_arg.courier = &courier; splitter_arg.worker_arg.courier = &courier;
splitter_arg.pp = &pp; splitter_arg.worker_arg.pp = &pp;
splitter_arg.worker_arg.dictionary_size = dictionary_size;
splitter_arg.worker_arg.match_len_limit = match_len_limit;
splitter_arg.worker_arg.offset = offset;
splitter_arg.worker_threads = worker_threads;
splitter_arg.infd = infd; splitter_arg.infd = infd;
splitter_arg.data_size = data_size; splitter_arg.data_size = data_size;
splitter_arg.offset = offset; splitter_arg.num_workers = num_workers;
pthread_t splitter_thread; pthread_t splitter_thread;
int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg ); int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg );
if( errcode ) if( errcode )
{ show_error( "Can't create splitter thread", errcode ); cleanup_and_fail(); } { show_error( "Can't create splitter thread", errcode );
delete[] worker_threads; return 1; }
if( verbosity >= 1 ) pp(); if( verbosity >= 1 ) pp();
show_progress( 0, cfile_size, &pp ); // init show_progress( 0, cfile_size, &pp ); // init
Worker_arg worker_arg;
worker_arg.courier = &courier;
worker_arg.pp = &pp;
worker_arg.dictionary_size = dictionary_size;
worker_arg.match_len_limit = match_len_limit;
worker_arg.offset = offset;
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
if( !worker_threads ) { pp( mem_msg ); cleanup_and_fail(); }
for( int i = 0; i < num_workers; ++i )
{
errcode = pthread_create( worker_threads + i, 0, cworker, &worker_arg );
if( errcode )
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); }
}
muxer( courier, pp, outfd ); muxer( courier, pp, outfd );
for( int i = num_workers - 1; i >= 0; --i ) errcode = pthread_join( splitter_thread, 0 );
{ if( errcode ) { show_error( "Can't join splitter thread", errcode );
cleanup_and_fail(); }
for( int i = splitter_arg.num_workers; --i >= 0; )
{ // join only the workers started
errcode = pthread_join( worker_threads[i], 0 ); errcode = pthread_join( worker_threads[i], 0 );
if( errcode ) if( errcode ) { show_error( "Can't join worker threads", errcode );
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } cleanup_and_fail(); }
} }
delete[] worker_threads; delete[] worker_threads;
errcode = pthread_join( splitter_thread, 0 );
if( errcode )
{ show_error( "Can't join splitter thread", errcode ); cleanup_and_fail(); }
if( verbosity >= 1 ) if( verbosity >= 1 )
{ {
if( in_size == 0 || out_size == 0 ) if( in_size == 0 || out_size == 0 )
@ -537,14 +544,14 @@ int compress( const unsigned long long cfile_size,
if( debug_level & 1 ) if( debug_level & 1 )
std::fprintf( stderr, std::fprintf( stderr,
"workers started %8u\n"
"any worker tried to consume from splitter %8u times\n" "any worker tried to consume from splitter %8u times\n"
"any worker had to wait %8u times\n" "any worker had to wait %8u times\n"
"muxer tried to consume from workers %8u times\n" "muxer tried to consume from workers %8u times\n"
"muxer had to wait %8u times\n", "muxer had to wait %8u times\n",
courier.icheck_counter, splitter_arg.num_workers,
courier.iwait_counter, courier.icheck_counter, courier.iwait_counter,
courier.ocheck_counter, courier.ocheck_counter, courier.owait_counter );
courier.owait_counter );
if( !courier.finished() ) internal_error( "courier not finished." ); if( !courier.finished() ) internal_error( "courier not finished." );
return 0; return 0;

27
configure vendored
View file

@ -1,12 +1,12 @@
#! /bin/sh #! /bin/sh
# configure script for Plzip - Massively parallel implementation of lzip # configure script for Plzip - Massively parallel implementation of lzip
# Copyright (C) 2009-2019 Antonio Diaz Diaz. # Copyright (C) 2009-2021 Antonio Diaz Diaz.
# #
# This configure script is free software: you have unlimited permission # This configure script is free software: you have unlimited permission
# to copy, distribute and modify it. # to copy, distribute, and modify it.
pkgname=plzip pkgname=plzip
pkgversion=1.8 pkgversion=1.9
progname=plzip progname=plzip
with_mingw= with_mingw=
srctrigger=doc/${pkgname}.texi srctrigger=doc/${pkgname}.texi
@ -27,11 +27,7 @@ CXXFLAGS='-Wall -W -O2'
LDFLAGS= LDFLAGS=
# checking whether we are using GNU C++. # checking whether we are using GNU C++.
/bin/sh -c "${CXX} --version" > /dev/null 2>&1 || /bin/sh -c "${CXX} --version" > /dev/null 2>&1 || { CXX=c++ ; CXXFLAGS=-O2 ; }
{
CXX=c++
CXXFLAGS=-O2
}
# Loop over all args # Loop over all args
args= args=
@ -43,11 +39,12 @@ while [ $# != 0 ] ; do
shift shift
# Add the argument quoted to args # Add the argument quoted to args
args="${args} \"${option}\"" if [ -z "${args}" ] ; then args="\"${option}\""
else args="${args} \"${option}\"" ; fi
# Split out the argument for options that take them # Split out the argument for options that take them
case ${option} in case ${option} in
*=*) optarg=`echo ${option} | sed -e 's,^[^=]*=,,;s,/$,,'` ;; *=*) optarg=`echo "${option}" | sed -e 's,^[^=]*=,,;s,/$,,'` ;;
esac esac
# Process the options # Process the options
@ -128,7 +125,7 @@ if [ -z "${srcdir}" ] ; then
if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi
if [ ! -r "${srcdir}/${srctrigger}" ] ; then if [ ! -r "${srcdir}/${srctrigger}" ] ; then
## the sed command below emulates the dirname command ## the sed command below emulates the dirname command
srcdir=`echo $0 | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` srcdir=`echo "$0" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
fi fi
fi fi
@ -151,7 +148,7 @@ if [ -z "${no_create}" ] ; then
# Run this file to recreate the current configuration. # Run this file to recreate the current configuration.
# #
# This script is free software: you have unlimited permission # This script is free software: you have unlimited permission
# to copy, distribute and modify it. # to copy, distribute, and modify it.
exec /bin/sh $0 ${args} --no-create exec /bin/sh $0 ${args} --no-create
EOF EOF
@ -174,11 +171,11 @@ echo "LDFLAGS = ${LDFLAGS}"
rm -f Makefile rm -f Makefile
cat > Makefile << EOF cat > Makefile << EOF
# Makefile for Plzip - Massively parallel implementation of lzip # Makefile for Plzip - Massively parallel implementation of lzip
# Copyright (C) 2009-2019 Antonio Diaz Diaz. # Copyright (C) 2009-2021 Antonio Diaz Diaz.
# This file was generated automatically by configure. Don't edit. # This file was generated automatically by configure. Don't edit.
# #
# This Makefile is free software: you have unlimited permission # This Makefile is free software: you have unlimited permission
# to copy, distribute and modify it. # to copy, distribute, and modify it.
pkgname = ${pkgname} pkgname = ${pkgname}
pkgversion = ${pkgversion} pkgversion = ${pkgversion}
@ -199,5 +196,5 @@ EOF
cat "${srcdir}/Makefile.in" >> Makefile cat "${srcdir}/Makefile.in" >> Makefile
echo "OK. Now you can run make." echo "OK. Now you can run make."
echo "If make fails, verify that the lzlib compression library is correctly" echo "If make fails, verify that the compression library lzlib is correctly"
echo "installed (see INSTALL)." echo "installed (see INSTALL)."

View file

@ -1,6 +1,6 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009 Laszlo Ersek.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -28,7 +28,6 @@
#include <queue> #include <queue>
#include <string> #include <string>
#include <vector> #include <vector>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
#include <lzlib.h> #include <lzlib.h>
@ -44,10 +43,13 @@ enum { max_packet_size = 1 << 20 };
struct Packet // data block struct Packet // data block
{ {
uint8_t * data; // data == 0 means end of member uint8_t * data; // data may be null if size == 0
int size; // number of bytes in data (if any) int size; // number of bytes in data (if any)
explicit Packet( uint8_t * const d = 0, const int s = 0 ) bool eom; // end of member
: data( d ), size( s ) {} Packet() : data( 0 ), size( 0 ), eom( true ) {}
Packet( uint8_t * const d, const int s, const bool e )
: data( d ), size( s ), eom ( e ) {}
~Packet() { if( data ) delete[] data; }
}; };
@ -58,23 +60,25 @@ public:
unsigned owait_counter; unsigned owait_counter;
private: private:
int deliver_worker_id; // worker queue currently delivering packets int deliver_worker_id; // worker queue currently delivering packets
std::vector< std::queue< Packet * > > opacket_queues; std::vector< std::queue< const Packet * > > opacket_queues;
int num_working; // number of workers still running int num_working; // number of workers still running
const int num_workers; // number of workers const int num_workers; // number of workers
const unsigned out_slots; // max output packets per queue const unsigned out_slots; // max output packets per queue
pthread_mutex_t omutex; pthread_mutex_t omutex;
pthread_cond_t oav_or_exit; // output packet available or all workers exited pthread_cond_t oav_or_exit; // output packet available or all workers exited
std::vector< pthread_cond_t > slot_av; // output slot available std::vector< pthread_cond_t > slot_av; // output slot available
const Shared_retval & shared_retval; // discard new packets on error
Packet_courier( const Packet_courier & ); // declared as private Packet_courier( const Packet_courier & ); // declared as private
void operator=( const Packet_courier & ); // declared as private void operator=( const Packet_courier & ); // declared as private
public: public:
Packet_courier( const int workers, const int slots ) Packet_courier( const Shared_retval & sh_ret, const int workers,
: ocheck_counter( 0 ), owait_counter( 0 ), const int slots )
deliver_worker_id( 0 ), : ocheck_counter( 0 ), owait_counter( 0 ), deliver_worker_id( 0 ),
opacket_queues( workers ), num_working( workers ), opacket_queues( workers ), num_working( workers ),
num_workers( workers ), out_slots( slots ), slot_av( workers ) num_workers( workers ), out_slots( slots ), slot_av( workers ),
shared_retval( sh_ret )
{ {
xinit_mutex( &omutex ); xinit_cond( &oav_or_exit ); xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] ); for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] );
@ -82,6 +86,10 @@ public:
~Packet_courier() ~Packet_courier()
{ {
if( shared_retval() ) // cleanup to avoid memory leaks
for( int i = 0; i < num_workers; ++i )
while( !opacket_queues[i].empty() )
{ delete opacket_queues[i].front(); opacket_queues[i].pop(); }
for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] ); for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex ); xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
} }
@ -94,25 +102,28 @@ public:
xunlock( &omutex ); xunlock( &omutex );
} }
// collect a packet from a worker // collect a packet from a worker, discard packet on error
void collect_packet( Packet * const opacket, const int worker_id ) void collect_packet( const Packet * const opacket, const int worker_id )
{ {
xlock( &omutex ); xlock( &omutex );
if( opacket->data ) if( opacket->data )
{
while( opacket_queues[worker_id].size() >= out_slots ) while( opacket_queues[worker_id].size() >= out_slots )
{
if( shared_retval() ) { delete opacket; goto done; }
xwait( &slot_av[worker_id], &omutex ); xwait( &slot_av[worker_id], &omutex );
} }
opacket_queues[worker_id].push( opacket ); opacket_queues[worker_id].push( opacket );
if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit ); if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
done:
xunlock( &omutex ); xunlock( &omutex );
} }
// deliver a packet to muxer /* deliver a packet to muxer
// if packet data == 0, move to next queue and wait again if packet->eom, move to next queue
Packet * deliver_packet() if packet data == 0, wait again */
const Packet * deliver_packet()
{ {
Packet * opacket = 0; const Packet * opacket = 0;
xlock( &omutex ); xlock( &omutex );
++ocheck_counter; ++ocheck_counter;
while( true ) while( true )
@ -127,8 +138,9 @@ public:
opacket_queues[deliver_worker_id].pop(); opacket_queues[deliver_worker_id].pop();
if( opacket_queues[deliver_worker_id].size() + 1 == out_slots ) if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
xsignal( &slot_av[deliver_worker_id] ); xsignal( &slot_av[deliver_worker_id] );
if( opacket->eom && ++deliver_worker_id >= num_workers )
deliver_worker_id = 0;
if( opacket->data ) break; if( opacket->data ) break;
if( ++deliver_worker_id >= num_workers ) deliver_worker_id = 0;
delete opacket; opacket = 0; delete opacket; opacket = 0;
} }
xunlock( &omutex ); xunlock( &omutex );
@ -150,32 +162,34 @@ struct Worker_arg
const Lzip_index * lzip_index; const Lzip_index * lzip_index;
Packet_courier * courier; Packet_courier * courier;
const Pretty_print * pp; const Pretty_print * pp;
Shared_retval * shared_retval;
int worker_id; int worker_id;
int num_workers; int num_workers;
int infd; int infd;
}; };
// read members from file, decompress their contents, and /* Read members from file, decompress their contents, and give to courier
// give the produced packets to courier. the packets produced.
*/
extern "C" void * dworker_o( void * arg ) extern "C" void * dworker_o( void * arg )
{ {
const Worker_arg & tmp = *(const Worker_arg *)arg; const Worker_arg & tmp = *(const Worker_arg *)arg;
const Lzip_index & lzip_index = *tmp.lzip_index; const Lzip_index & lzip_index = *tmp.lzip_index;
Packet_courier & courier = *tmp.courier; Packet_courier & courier = *tmp.courier;
const Pretty_print & pp = *tmp.pp; const Pretty_print & pp = *tmp.pp;
Shared_retval & shared_retval = *tmp.shared_retval;
const int worker_id = tmp.worker_id; const int worker_id = tmp.worker_id;
const int num_workers = tmp.num_workers; const int num_workers = tmp.num_workers;
const int infd = tmp.infd; const int infd = tmp.infd;
const int buffer_size = 65536; const int buffer_size = 65536;
uint8_t * new_data = new( std::nothrow ) uint8_t[max_packet_size]; int new_pos = 0;
uint8_t * new_data = 0;
uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size]; uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size];
LZ_Decoder * const decoder = LZ_decompress_open(); LZ_Decoder * const decoder = LZ_decompress_open();
if( !new_data || !ibuffer || !decoder || if( !ibuffer || !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
LZ_decompress_errno( decoder ) != LZ_ok ) { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
{ pp( "Not enough memory." ); cleanup_and_fail(); }
int new_pos = 0;
for( long i = worker_id; i < lzip_index.members(); i += num_workers ) for( long i = worker_id; i < lzip_index.members(); i += num_workers )
{ {
@ -184,6 +198,7 @@ extern "C" void * dworker_o( void * arg )
while( member_rest > 0 ) while( member_rest > 0 )
{ {
if( shared_retval() ) goto done; // other worker found a problem
while( LZ_decompress_write_size( decoder ) > 0 ) while( LZ_decompress_write_size( decoder ) > 0 )
{ {
const int size = std::min( LZ_decompress_write_size( decoder ), const int size = std::min( LZ_decompress_write_size( decoder ),
@ -191,7 +206,8 @@ extern "C" void * dworker_o( void * arg )
if( size > 0 ) if( size > 0 )
{ {
if( preadblock( infd, ibuffer, size, member_pos ) != size ) if( preadblock( infd, ibuffer, size, member_pos ) != size )
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); } { if( shared_retval.set_value( 1 ) )
{ pp(); show_error( "Read error", errno ); } goto done; }
member_pos += size; member_pos += size;
member_rest -= size; member_rest -= size;
if( LZ_decompress_write( decoder, ibuffer, size ) != size ) if( LZ_decompress_write( decoder, ibuffer, size ) != size )
@ -201,60 +217,60 @@ extern "C" void * dworker_o( void * arg )
} }
while( true ) // read and pack decompressed data while( true ) // read and pack decompressed data
{ {
if( !new_data &&
!( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) )
{ if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
const int rd = LZ_decompress_read( decoder, new_data + new_pos, const int rd = LZ_decompress_read( decoder, new_data + new_pos,
max_packet_size - new_pos ); max_packet_size - new_pos );
if( rd < 0 ) if( rd < 0 )
cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) ); { decompress_error( decoder, pp, shared_retval, worker_id );
goto done; }
new_pos += rd; new_pos += rd;
if( new_pos > max_packet_size ) if( new_pos > max_packet_size )
internal_error( "opacket size exceeded in worker." ); internal_error( "opacket size exceeded in worker." );
if( new_pos == max_packet_size || const bool eom = LZ_decompress_finished( decoder ) == 1;
LZ_decompress_finished( decoder ) == 1 ) if( new_pos == max_packet_size || eom ) // make data packet
{ {
if( new_pos > 0 ) // make data packet const Packet * const opacket =
{ new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
Packet * const opacket = new Packet( new_data, new_pos );
courier.collect_packet( opacket, worker_id ); courier.collect_packet( opacket, worker_id );
new_pos = 0; if( new_pos > 0 ) { new_pos = 0; new_data = 0; }
new_data = new( std::nothrow ) uint8_t[max_packet_size]; if( eom )
if( !new_data ) { pp( "Not enough memory." ); cleanup_and_fail(); } { LZ_decompress_reset( decoder ); // prepare for new member
} break; }
if( LZ_decompress_finished( decoder ) == 1 )
{ // end of member token
courier.collect_packet( new Packet, worker_id );
LZ_decompress_reset( decoder ); // prepare for new member
break;
}
} }
if( rd == 0 ) break; if( rd == 0 ) break;
} }
} }
show_progress( lzip_index.mblock( i ).size() ); show_progress( lzip_index.mblock( i ).size() );
} }
done:
delete[] ibuffer; delete[] new_data; delete[] ibuffer; if( new_data ) delete[] new_data;
if( LZ_decompress_member_position( decoder ) != 0 ) if( LZ_decompress_member_position( decoder ) != 0 &&
{ pp( "Error, some data remains in decoder." ); cleanup_and_fail(); } shared_retval.set_value( 1 ) )
if( LZ_decompress_close( decoder ) < 0 ) pp( "Error, some data remains in decoder." );
{ pp( "LZ_decompress_close failed." ); cleanup_and_fail(); } if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
pp( "LZ_decompress_close failed." );
courier.worker_finished(); courier.worker_finished();
return 0; return 0;
} }
// get from courier the processed and sorted packets, and write /* Get from courier the processed and sorted packets, and write their
// their contents to the output file. contents to the output file. Drain queue on error.
void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) */
void muxer( Packet_courier & courier, const Pretty_print & pp,
Shared_retval & shared_retval, const int outfd )
{ {
while( true ) while( true )
{ {
Packet * const opacket = courier.deliver_packet(); const Packet * const opacket = courier.deliver_packet();
if( !opacket ) break; // queue is empty. all workers exited if( !opacket ) break; // queue is empty. all workers exited
const int wr = writeblock( outfd, opacket->data, opacket->size ); if( shared_retval() == 0 &&
if( wr != opacket->size ) writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); } shared_retval.set_value( 1 ) )
delete[] opacket->data; { pp(); show_error( "Write error", errno ); }
delete opacket; delete opacket;
} }
} }
@ -262,66 +278,59 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
} // end namespace } // end namespace
// init the courier, then start the workers and call the muxer. // init the courier, then start the workers and call the muxer.
int dec_stdout( const int num_workers, const int infd, const int outfd, int dec_stdout( const int num_workers, const int infd, const int outfd,
const Pretty_print & pp, const int debug_level, const Pretty_print & pp, const int debug_level,
const int out_slots, const Lzip_index & lzip_index ) const int out_slots, const Lzip_index & lzip_index )
{ {
Packet_courier courier( num_workers, out_slots ); Shared_retval shared_retval;
Packet_courier courier( shared_retval, num_workers, out_slots );
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
if( !worker_args || !worker_threads ) if( !worker_args || !worker_threads )
{ pp( "Not enough memory." ); cleanup_and_fail(); } { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
for( int i = 0; i < num_workers; ++i )
int i = 0; // number of workers started
for( ; i < num_workers; ++i )
{ {
worker_args[i].lzip_index = &lzip_index; worker_args[i].lzip_index = &lzip_index;
worker_args[i].courier = &courier; worker_args[i].courier = &courier;
worker_args[i].pp = &pp; worker_args[i].pp = &pp;
worker_args[i].shared_retval = &shared_retval;
worker_args[i].worker_id = i; worker_args[i].worker_id = i;
worker_args[i].num_workers = num_workers; worker_args[i].num_workers = num_workers;
worker_args[i].infd = infd; worker_args[i].infd = infd;
const int errcode = const int errcode =
pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] ); pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] );
if( errcode ) if( errcode )
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } { if( shared_retval.set_value( 1 ) )
{ show_error( "Can't create worker threads", errcode ); } break; }
} }
muxer( courier, pp, outfd ); muxer( courier, pp, shared_retval, outfd );
for( int i = num_workers - 1; i >= 0; --i ) while( --i >= 0 )
{ {
const int errcode = pthread_join( worker_threads[i], 0 ); const int errcode = pthread_join( worker_threads[i], 0 );
if( errcode ) if( errcode && shared_retval.set_value( 1 ) )
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } show_error( "Can't join worker threads", errcode );
} }
delete[] worker_threads; delete[] worker_threads;
delete[] worker_args; delete[] worker_args;
if( verbosity >= 2 ) if( shared_retval() ) return shared_retval(); // some thread found a problem
{
if( verbosity >= 4 ) show_header( lzip_index.dictionary_size( 0 ) ); if( verbosity >= 1 )
const unsigned long long in_size = lzip_index.cdata_size(); show_results( lzip_index.cdata_size(), lzip_index.udata_size(),
const unsigned long long out_size = lzip_index.udata_size(); lzip_index.dictionary_size(), false );
if( out_size == 0 || in_size == 0 )
std::fputs( "no data compressed. ", stderr );
else
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
(double)out_size / in_size,
( 100.0 * in_size ) / out_size,
100.0 - ( ( 100.0 * in_size ) / out_size ) );
if( verbosity >= 3 )
std::fprintf( stderr, "decompressed %9llu, compressed %8llu. ",
out_size, in_size );
}
if( verbosity >= 1 ) std::fputs( "done\n", stderr );
if( debug_level & 1 ) if( debug_level & 1 )
std::fprintf( stderr, std::fprintf( stderr,
"workers started %8u\n"
"muxer tried to consume from workers %8u times\n" "muxer tried to consume from workers %8u times\n"
"muxer had to wait %8u times\n", "muxer had to wait %8u times\n",
courier.ocheck_counter, num_workers, courier.ocheck_counter, courier.owait_counter );
courier.owait_counter );
if( !courier.finished() ) internal_error( "courier not finished." ); if( !courier.finished() ) internal_error( "courier not finished." );
return 0; return 0;

View file

@ -1,6 +1,6 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009 Laszlo Ersek.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -28,13 +28,19 @@
#include <queue> #include <queue>
#include <string> #include <string>
#include <vector> #include <vector>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
#include <lzlib.h> #include <lzlib.h>
#include "lzip.h" #include "lzip.h"
/* When a problem is detected by any thread:
- the thread sets shared_retval to 1 or 2.
- the splitter sets eof and returns.
- the courier discards new packets received or collected.
- the workers drain the queue and return.
- the muxer drains the queue and returns.
(Draining seems to be faster than cleaning up later). */
namespace { namespace {
@ -45,10 +51,13 @@ unsigned long long out_size = 0;
struct Packet // data block struct Packet // data block
{ {
uint8_t * data; // data == 0 means end of member uint8_t * data; // data may be null if size == 0
int size; // number of bytes in data (if any) int size; // number of bytes in data (if any)
explicit Packet( uint8_t * const d = 0, const int s = 0 ) bool eom; // end of member
: data( d ), size( s ) {} Packet() : data( 0 ), size( 0 ), eom( true ) {}
Packet( uint8_t * const d, const int s, const bool e )
: data( d ), size( s ), eom ( e ) {}
~Packet() { if( data ) delete[] data; }
}; };
@ -63,8 +72,8 @@ private:
int receive_worker_id; // worker queue currently receiving packets int receive_worker_id; // worker queue currently receiving packets
int deliver_worker_id; // worker queue currently delivering packets int deliver_worker_id; // worker queue currently delivering packets
Slot_tally slot_tally; // limits the number of input packets Slot_tally slot_tally; // limits the number of input packets
std::vector< std::queue< Packet * > > ipacket_queues; std::vector< std::queue< const Packet * > > ipacket_queues;
std::vector< std::queue< Packet * > > opacket_queues; std::vector< std::queue< const Packet * > > opacket_queues;
int num_working; // number of workers still running int num_working; // number of workers still running
const int num_workers; // number of workers const int num_workers; // number of workers
const unsigned out_slots; // max output packets per queue const unsigned out_slots; // max output packets per queue
@ -73,20 +82,23 @@ private:
pthread_mutex_t omutex; pthread_mutex_t omutex;
pthread_cond_t oav_or_exit; // output packet available or all workers exited pthread_cond_t oav_or_exit; // output packet available or all workers exited
std::vector< pthread_cond_t > slot_av; // output slot available std::vector< pthread_cond_t > slot_av; // output slot available
const Shared_retval & shared_retval; // discard new packets on error
bool eof; // splitter done bool eof; // splitter done
bool trailing_data_found_; // a worker found trailing data
Packet_courier( const Packet_courier & ); // declared as private Packet_courier( const Packet_courier & ); // declared as private
void operator=( const Packet_courier & ); // declared as private void operator=( const Packet_courier & ); // declared as private
public: public:
Packet_courier( const int workers, const int in_slots, const int oslots ) Packet_courier( const Shared_retval & sh_ret, const int workers,
const int in_slots, const int oslots )
: icheck_counter( 0 ), iwait_counter( 0 ), : icheck_counter( 0 ), iwait_counter( 0 ),
ocheck_counter( 0 ), owait_counter( 0 ), ocheck_counter( 0 ), owait_counter( 0 ),
receive_worker_id( 0 ), deliver_worker_id( 0 ), receive_worker_id( 0 ), deliver_worker_id( 0 ),
slot_tally( in_slots ), ipacket_queues( workers ), slot_tally( in_slots ), ipacket_queues( workers ),
opacket_queues( workers ), num_working( workers ), opacket_queues( workers ), num_working( workers ),
num_workers( workers ), out_slots( oslots ), slot_av( workers ), num_workers( workers ), out_slots( oslots ), slot_av( workers ),
eof( false ) shared_retval( sh_ret ), eof( false ), trailing_data_found_( false )
{ {
xinit_mutex( &imutex ); xinit_cond( &iav_or_eof ); xinit_mutex( &imutex ); xinit_cond( &iav_or_eof );
xinit_mutex( &omutex ); xinit_cond( &oav_or_exit ); xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
@ -95,30 +107,37 @@ public:
~Packet_courier() ~Packet_courier()
{ {
if( shared_retval() ) // cleanup to avoid memory leaks
for( int i = 0; i < num_workers; ++i )
{
while( !ipacket_queues[i].empty() )
{ delete ipacket_queues[i].front(); ipacket_queues[i].pop(); }
while( !opacket_queues[i].empty() )
{ delete opacket_queues[i].front(); opacket_queues[i].pop(); }
}
for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] ); for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex ); xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex ); xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex );
} }
// make a packet with data received from splitter /* Make a packet with data received from splitter.
// if data == 0 (end of member token), move to next queue If eom == true (end of member), move to next queue. */
void receive_packet( uint8_t * const data, const int size ) void receive_packet( uint8_t * const data, const int size, const bool eom )
{ {
Packet * const ipacket = new Packet( data, size ); if( shared_retval() ) { delete[] data; return; } // discard packet on error
if( data ) const Packet * const ipacket = new Packet( data, size, eom );
{ in_size += size; slot_tally.get_slot(); } // wait for a free slot slot_tally.get_slot(); // wait for a free slot
xlock( &imutex ); xlock( &imutex );
ipacket_queues[receive_worker_id].push( ipacket ); ipacket_queues[receive_worker_id].push( ipacket );
xbroadcast( &iav_or_eof ); xbroadcast( &iav_or_eof );
xunlock( &imutex ); xunlock( &imutex );
if( !data && ++receive_worker_id >= num_workers ) if( eom && ++receive_worker_id >= num_workers ) receive_worker_id = 0;
receive_worker_id = 0;
} }
// distribute a packet to a worker // distribute a packet to a worker
Packet * distribute_packet( const int worker_id ) const Packet * distribute_packet( const int worker_id )
{ {
Packet * ipacket = 0; const Packet * ipacket = 0;
xlock( &imutex ); xlock( &imutex );
++icheck_counter; ++icheck_counter;
while( ipacket_queues[worker_id].empty() && !eof ) while( ipacket_queues[worker_id].empty() && !eof )
@ -132,37 +151,38 @@ public:
ipacket_queues[worker_id].pop(); ipacket_queues[worker_id].pop();
} }
xunlock( &imutex ); xunlock( &imutex );
if( ipacket ) if( ipacket ) slot_tally.leave_slot();
{ if( ipacket->data ) slot_tally.leave_slot(); } else // no more packets
else
{ {
// notify muxer when last worker exits xlock( &omutex ); // notify muxer when last worker exits
xlock( &omutex );
if( --num_working == 0 ) xsignal( &oav_or_exit ); if( --num_working == 0 ) xsignal( &oav_or_exit );
xunlock( &omutex ); xunlock( &omutex );
} }
return ipacket; return ipacket;
} }
// collect a packet from a worker // collect a packet from a worker, discard packet on error
void collect_packet( Packet * const opacket, const int worker_id ) void collect_packet( const Packet * const opacket, const int worker_id )
{ {
xlock( &omutex ); xlock( &omutex );
if( opacket->data ) if( opacket->data )
{
while( opacket_queues[worker_id].size() >= out_slots ) while( opacket_queues[worker_id].size() >= out_slots )
{
if( shared_retval() ) { delete opacket; goto done; }
xwait( &slot_av[worker_id], &omutex ); xwait( &slot_av[worker_id], &omutex );
} }
opacket_queues[worker_id].push( opacket ); opacket_queues[worker_id].push( opacket );
if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit ); if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
done:
xunlock( &omutex ); xunlock( &omutex );
} }
// deliver a packet to muxer /* deliver a packet to muxer
// if packet data == 0, move to next queue and wait again if packet->eom, move to next queue
Packet * deliver_packet() if packet data == 0, wait again */
const Packet * deliver_packet()
{ {
Packet * opacket = 0; const Packet * opacket = 0;
xlock( &omutex ); xlock( &omutex );
++ocheck_counter; ++ocheck_counter;
while( true ) while( true )
@ -177,27 +197,37 @@ public:
opacket_queues[deliver_worker_id].pop(); opacket_queues[deliver_worker_id].pop();
if( opacket_queues[deliver_worker_id].size() + 1 == out_slots ) if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
xsignal( &slot_av[deliver_worker_id] ); xsignal( &slot_av[deliver_worker_id] );
if( opacket->eom && ++deliver_worker_id >= num_workers )
deliver_worker_id = 0;
if( opacket->data ) break; if( opacket->data ) break;
if( ++deliver_worker_id >= num_workers ) deliver_worker_id = 0;
delete opacket; opacket = 0; delete opacket; opacket = 0;
} }
xunlock( &omutex ); xunlock( &omutex );
return opacket; return opacket;
} }
void add_out_size( const unsigned long long partial_out_size ) void add_sizes( const unsigned long long partial_in_size,
{ const unsigned long long partial_out_size )
xlock( &omutex );
out_size += partial_out_size;
xunlock( &omutex );
}
void finish() // splitter has no more packets to send
{ {
xlock( &imutex ); xlock( &imutex );
in_size += partial_in_size;
out_size += partial_out_size;
xunlock( &imutex );
}
void set_trailing_flag() { trailing_data_found_ = true; }
bool trailing_data_found() { return trailing_data_found_; }
void finish( const int workers_started )
{
xlock( &imutex ); // splitter has no more packets to send
eof = true; eof = true;
xbroadcast( &iav_or_eof ); xbroadcast( &iav_or_eof );
xunlock( &imutex ); xunlock( &imutex );
xlock( &omutex ); // notify muxer if all workers have exited
num_working -= num_workers - workers_started; // workers spared
if( num_working <= 0 ) xsignal( &oav_or_exit );
xunlock( &omutex );
} }
bool finished() // all packets delivered to muxer bool finished() // all packets delivered to muxer
@ -212,173 +242,60 @@ public:
}; };
// Search forward from 'pos' for "LZIP" (Boyer-Moore algorithm)
// Returns pos of found string or 'pos+size' if not found.
//
int find_magic( const uint8_t * const buffer, const int pos, const int size )
{
const uint8_t table[256] = {
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,1,4,4,3,4,4,4,4,4,4,4,4,4,4,4,4,4,2,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4 };
for( int i = pos; i <= pos + size - 4; i += table[buffer[i+3]] )
if( buffer[i] == 'L' && buffer[i+1] == 'Z' &&
buffer[i+2] == 'I' && buffer[i+3] == 'P' )
return i; // magic string found
return pos + size;
}
struct Splitter_arg
{
unsigned long long cfile_size;
Packet_courier * courier;
const Pretty_print * pp;
int infd;
unsigned dictionary_size; // returned by splitter to main thread
};
// split data from input file into chunks and pass them to
// courier for packaging and distribution to workers.
extern "C" void * dsplitter_s( void * arg )
{
Splitter_arg & tmp = *(Splitter_arg *)arg;
Packet_courier & courier = *tmp.courier;
const Pretty_print & pp = *tmp.pp;
const int infd = tmp.infd;
const int hsize = Lzip_header::size;
const int tsize = Lzip_trailer::size;
const int buffer_size = max_packet_size;
const int base_buffer_size = tsize + buffer_size + hsize;
uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size];
if( !base_buffer ) { pp( "Not enough memory." ); cleanup_and_fail(); }
uint8_t * const buffer = base_buffer + tsize;
int size = readblock( infd, buffer, buffer_size + hsize ) - hsize;
bool at_stream_end = ( size < buffer_size );
if( size != buffer_size && errno )
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
if( size + hsize < min_member_size )
{ show_file_error( pp.name(), "Input file is too short." );
cleanup_and_fail( 2 ); }
const Lzip_header & header = *(const Lzip_header *)buffer;
if( !header.verify_magic() )
{ show_file_error( pp.name(), bad_magic_msg ); cleanup_and_fail( 2 ); }
if( !header.verify_version() )
{ pp( bad_version( header.version() ) ); cleanup_and_fail( 2 ); }
tmp.dictionary_size = header.dictionary_size();
if( !isvalid_ds( tmp.dictionary_size ) )
{ pp( bad_dict_msg ); cleanup_and_fail( 2 ); }
if( verbosity >= 1 ) pp();
show_progress( 0, tmp.cfile_size, &pp ); // init
unsigned long long partial_member_size = 0;
while( true )
{
int pos = 0;
for( int newpos = 1; newpos <= size; ++newpos )
{
newpos = find_magic( buffer, newpos, size + 4 - newpos );
if( newpos <= size )
{
const Lzip_trailer & trailer =
*(const Lzip_trailer *)(buffer + newpos - tsize);
const unsigned long long member_size = trailer.member_size();
if( partial_member_size + newpos - pos == member_size )
{ // header found
const Lzip_header & header = *(const Lzip_header *)(buffer + newpos);
if( !header.verify_version() )
{ pp( bad_version( header.version() ) ); cleanup_and_fail( 2 ); }
const unsigned dictionary_size = header.dictionary_size();
if( !isvalid_ds( dictionary_size ) )
{ pp( bad_dict_msg ); cleanup_and_fail( 2 ); }
uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos];
if( !data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
std::memcpy( data, buffer + pos, newpos - pos );
courier.receive_packet( data, newpos - pos );
courier.receive_packet( 0, 0 ); // end of member token
partial_member_size = 0;
pos = newpos;
show_progress( member_size );
}
}
}
if( at_stream_end )
{
uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos];
if( !data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
std::memcpy( data, buffer + pos, size + hsize - pos );
courier.receive_packet( data, size + hsize - pos );
courier.receive_packet( 0, 0 ); // end of member token
break;
}
if( pos < buffer_size )
{
partial_member_size += buffer_size - pos;
uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos];
if( !data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
std::memcpy( data, buffer + pos, buffer_size - pos );
courier.receive_packet( data, buffer_size - pos );
}
std::memcpy( base_buffer, base_buffer + buffer_size, tsize + hsize );
size = readblock( infd, buffer + hsize, buffer_size );
at_stream_end = ( size < buffer_size );
if( size != buffer_size && errno )
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
}
delete[] base_buffer;
courier.finish(); // no more packets to send
return 0;
}
struct Worker_arg struct Worker_arg
{ {
Packet_courier * courier; Packet_courier * courier;
const Pretty_print * pp; const Pretty_print * pp;
Shared_retval * shared_retval;
int worker_id; int worker_id;
bool ignore_trailing; bool ignore_trailing;
bool loose_trailing; bool loose_trailing;
bool testing; bool testing;
bool nocopy; // avoid copying decompressed data when testing
};
struct Splitter_arg
{
struct Worker_arg worker_arg;
Worker_arg * worker_args;
pthread_t * worker_threads;
unsigned long long cfile_size;
int infd;
unsigned dictionary_size; // returned by splitter to main thread
int num_workers; // returned by splitter to main thread
}; };
// consume packets from courier, decompress their contents and, /* Consume packets from courier, decompress their contents and, if not
// if not testing, give the produced packets to courier. testing, give to courier the packets produced.
*/
extern "C" void * dworker_s( void * arg ) extern "C" void * dworker_s( void * arg )
{ {
const Worker_arg & tmp = *(const Worker_arg *)arg; const Worker_arg & tmp = *(const Worker_arg *)arg;
Packet_courier & courier = *tmp.courier; Packet_courier & courier = *tmp.courier;
const Pretty_print & pp = *tmp.pp; const Pretty_print & pp = *tmp.pp;
Shared_retval & shared_retval = *tmp.shared_retval;
const int worker_id = tmp.worker_id; const int worker_id = tmp.worker_id;
const bool ignore_trailing = tmp.ignore_trailing; const bool ignore_trailing = tmp.ignore_trailing;
const bool loose_trailing = tmp.loose_trailing; const bool loose_trailing = tmp.loose_trailing;
const bool testing = tmp.testing; const bool testing = tmp.testing;
const bool nocopy = tmp.nocopy;
uint8_t * new_data = new( std::nothrow ) uint8_t[max_packet_size]; unsigned long long partial_in_size = 0, partial_out_size = 0;
LZ_Decoder * const decoder = LZ_decompress_open();
if( !new_data || !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
{ pp( "Not enough memory." ); cleanup_and_fail(); }
unsigned long long partial_out_size = 0;
int new_pos = 0; int new_pos = 0;
bool trailing_data_found = false; bool draining = false; // either trailing data or an error were found
uint8_t * new_data = 0;
LZ_Decoder * const decoder = LZ_decompress_open();
if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
{ draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg ); }
while( true ) while( true )
{ {
const Packet * const ipacket = courier.distribute_packet( worker_id ); const Packet * const ipacket = courier.distribute_packet( worker_id );
if( !ipacket ) break; // no more packets to process if( !ipacket ) break; // no more packets to process
if( !ipacket->data ) LZ_decompress_finish( decoder );
int written = 0; int written = 0;
while( !trailing_data_found ) while( !draining ) // else discard trailing data or drain queue
{ {
if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket->size ) if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket->size )
{ {
@ -389,85 +306,255 @@ extern "C" void * dworker_s( void * arg )
if( written > ipacket->size ) if( written > ipacket->size )
internal_error( "ipacket size exceeded in worker." ); internal_error( "ipacket size exceeded in worker." );
} }
while( !trailing_data_found ) // read and pack decompressed data if( ipacket->eom && written == ipacket->size )
LZ_decompress_finish( decoder );
unsigned long long total_in = 0; // detect empty member + corrupt header
while( !draining ) // read and pack decompressed data
{ {
const int rd = LZ_decompress_read( decoder, new_data + new_pos, if( !nocopy && !new_data &&
!( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) )
{ draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg );
break; }
const int rd = LZ_decompress_read( decoder,
nocopy ? 0 : new_data + new_pos,
max_packet_size - new_pos ); max_packet_size - new_pos );
if( rd < 0 ) if( rd < 0 ) // trailing data or decoder error
{ {
draining = true;
const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder ); const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder );
if( lz_errno == LZ_header_error ) if( lz_errno == LZ_header_error )
{ {
trailing_data_found = true; courier.set_trailing_flag();
if( !ignore_trailing ) if( !ignore_trailing )
{ pp( trailing_msg ); cleanup_and_fail( 2 ); } { if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); }
} }
else if( lz_errno == LZ_data_error && else if( lz_errno == LZ_data_error &&
LZ_decompress_member_position( decoder ) == 0 ) LZ_decompress_member_position( decoder ) == 0 )
{ {
trailing_data_found = true; courier.set_trailing_flag();
if( !loose_trailing ) if( !loose_trailing )
{ pp( corrupt_mm_msg ); cleanup_and_fail( 2 ); } { if( shared_retval.set_value( 2 ) ) pp( corrupt_mm_msg ); }
else if( !ignore_trailing ) else if( !ignore_trailing )
{ pp( trailing_msg ); cleanup_and_fail( 2 ); } { if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); }
} }
else else
cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) ); decompress_error( decoder, pp, shared_retval, worker_id );
} }
else new_pos += rd; else new_pos += rd;
if( new_pos > max_packet_size ) if( new_pos > max_packet_size )
internal_error( "opacket size exceeded in worker." ); internal_error( "opacket size exceeded in worker." );
if( new_pos == max_packet_size || trailing_data_found || if( LZ_decompress_member_finished( decoder ) == 1 )
LZ_decompress_finished( decoder ) == 1 )
{ {
if( !testing && new_pos > 0 ) // make data packet partial_in_size += LZ_decompress_member_position( decoder );
partial_out_size += LZ_decompress_data_position( decoder );
}
const bool eom = draining || LZ_decompress_finished( decoder ) == 1;
if( new_pos == max_packet_size || eom )
{ {
Packet * const opacket = new Packet( new_data, new_pos ); if( !testing ) // make data packet
{
const Packet * const opacket =
new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
courier.collect_packet( opacket, worker_id ); courier.collect_packet( opacket, worker_id );
new_data = new( std::nothrow ) uint8_t[max_packet_size]; if( new_pos > 0 ) new_data = 0;
if( !new_data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
} }
partial_out_size += new_pos;
new_pos = 0; new_pos = 0;
if( trailing_data_found || LZ_decompress_finished( decoder ) == 1 ) if( eom )
{ LZ_decompress_reset( decoder ); // prepare for new member
break; }
}
if( rd == 0 )
{ {
if( !testing ) // end of member token const unsigned long long size = LZ_decompress_total_in_size( decoder );
courier.collect_packet( new Packet, worker_id ); if( total_in == size ) break; else total_in = size;
LZ_decompress_reset( decoder ); // prepare for new member
break;
} }
} }
if( rd == 0 ) break;
}
if( !ipacket->data || written == ipacket->size ) break; if( !ipacket->data || written == ipacket->size ) break;
} }
if( ipacket->data ) delete[] ipacket->data;
delete ipacket; delete ipacket;
} }
delete[] new_data; if( new_data ) delete[] new_data;
courier.add_out_size( partial_out_size ); courier.add_sizes( partial_in_size, partial_out_size );
if( LZ_decompress_member_position( decoder ) != 0 ) if( LZ_decompress_member_position( decoder ) != 0 &&
{ pp( "Error, some data remains in decoder." ); cleanup_and_fail(); } shared_retval.set_value( 1 ) )
if( LZ_decompress_close( decoder ) < 0 ) pp( "Error, some data remains in decoder." );
{ pp( "LZ_decompress_close failed." ); cleanup_and_fail(); } if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
pp( "LZ_decompress_close failed." );
return 0; return 0;
} }
// get from courier the processed and sorted packets, and write bool start_worker( const Worker_arg & worker_arg,
// their contents to the output file. Worker_arg * const worker_args,
void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) pthread_t * const worker_threads, const int worker_id,
Shared_retval & shared_retval )
{
worker_args[worker_id] = worker_arg;
worker_args[worker_id].worker_id = worker_id;
const int errcode = pthread_create( &worker_threads[worker_id], 0,
dworker_s, &worker_args[worker_id] );
if( errcode && shared_retval.set_value( 1 ) )
show_error( "Can't create worker threads", errcode );
return errcode == 0;
}
/* Split data from input file into chunks and pass them to courier for
packaging and distribution to workers.
Start a worker per member up to a maximum of num_workers.
*/
extern "C" void * dsplitter_s( void * arg )
{
Splitter_arg & tmp = *(Splitter_arg *)arg;
const Worker_arg & worker_arg = tmp.worker_arg;
Packet_courier & courier = *worker_arg.courier;
const Pretty_print & pp = *worker_arg.pp;
Shared_retval & shared_retval = *worker_arg.shared_retval;
Worker_arg * const worker_args = tmp.worker_args;
pthread_t * const worker_threads = tmp.worker_threads;
const int infd = tmp.infd;
int worker_id = 0; // number of workers started
const int hsize = Lzip_header::size;
const int tsize = Lzip_trailer::size;
const int buffer_size = max_packet_size;
// buffer with room for trailer, header, data, and sentinel "LZIP"
const int base_buffer_size = tsize + hsize + buffer_size + 4;
uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size];
if( !base_buffer )
{
mem_fail:
if( shared_retval.set_value( 1 ) ) pp( mem_msg );
fail:
delete[] base_buffer;
courier.finish( worker_id ); // no more packets to send
tmp.num_workers = worker_id;
return 0;
}
uint8_t * const buffer = base_buffer + tsize;
int size = readblock( infd, buffer, buffer_size + hsize ) - hsize;
bool at_stream_end = ( size < buffer_size );
if( size != buffer_size && errno )
{ if( shared_retval.set_value( 1 ) )
{ pp(); show_error( "Read error", errno ); } goto fail; }
if( size + hsize < min_member_size )
{ if( shared_retval.set_value( 2 ) ) show_file_error( pp.name(),
( size <= 0 ) ? "File ends unexpectedly at member header." :
"Input file is too short." ); goto fail; }
const Lzip_header & header = *(const Lzip_header *)buffer;
if( !header.verify_magic() )
{ if( shared_retval.set_value( 2 ) )
{ show_file_error( pp.name(), bad_magic_msg ); } goto fail; }
if( !header.verify_version() )
{ if( shared_retval.set_value( 2 ) )
{ pp( bad_version( header.version() ) ); } goto fail; }
tmp.dictionary_size = header.dictionary_size();
if( !isvalid_ds( tmp.dictionary_size ) )
{ if( shared_retval.set_value( 2 ) ) { pp( bad_dict_msg ); } goto fail; }
if( verbosity >= 1 ) pp();
show_progress( 0, tmp.cfile_size, &pp ); // init
unsigned long long partial_member_size = 0;
bool worker_pending = true; // start 1 worker per first packet of member
while( true )
{
if( shared_retval() ) break; // stop sending packets on error
int pos = 0; // current searching position
std::memcpy( buffer + hsize + size, lzip_magic, 4 ); // sentinel
for( int newpos = 1; newpos <= size; ++newpos )
{
while( buffer[newpos] != lzip_magic[0] ||
buffer[newpos+1] != lzip_magic[1] ||
buffer[newpos+2] != lzip_magic[2] ||
buffer[newpos+3] != lzip_magic[3] ) ++newpos;
if( newpos <= size )
{
const Lzip_trailer & trailer =
*(const Lzip_trailer *)(buffer + newpos - tsize);
const unsigned long long member_size = trailer.member_size();
if( partial_member_size + newpos - pos == member_size &&
trailer.verify_consistency() )
{ // header found
const Lzip_header & header = *(const Lzip_header *)(buffer + newpos);
if( !header.verify_version() )
{ if( shared_retval.set_value( 2 ) )
{ pp( bad_version( header.version() ) ); } goto fail; }
const unsigned dictionary_size = header.dictionary_size();
if( !isvalid_ds( dictionary_size ) )
{ if( shared_retval.set_value( 2 ) ) pp( bad_dict_msg );
goto fail; }
if( tmp.dictionary_size < dictionary_size )
tmp.dictionary_size = dictionary_size;
uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos];
if( !data ) goto mem_fail;
std::memcpy( data, buffer + pos, newpos - pos );
courier.receive_packet( data, newpos - pos, true ); // eom
partial_member_size = 0;
pos = newpos;
if( worker_pending )
{ if( !start_worker( worker_arg, worker_args, worker_threads,
worker_id, shared_retval ) ) goto fail;
++worker_id; }
worker_pending = worker_id < tmp.num_workers;
show_progress( member_size );
}
}
}
if( at_stream_end )
{
uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos];
if( !data ) goto mem_fail;
std::memcpy( data, buffer + pos, size + hsize - pos );
courier.receive_packet( data, size + hsize - pos, true ); // eom
if( worker_pending &&
start_worker( worker_arg, worker_args, worker_threads,
worker_id, shared_retval ) ) ++worker_id;
break;
}
if( pos < buffer_size )
{
partial_member_size += buffer_size - pos;
uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos];
if( !data ) goto mem_fail;
std::memcpy( data, buffer + pos, buffer_size - pos );
courier.receive_packet( data, buffer_size - pos, false );
if( worker_pending )
{ if( !start_worker( worker_arg, worker_args, worker_threads,
worker_id, shared_retval ) ) break;
++worker_id; worker_pending = false; }
}
if( courier.trailing_data_found() ) break;
std::memcpy( base_buffer, base_buffer + buffer_size, tsize + hsize );
size = readblock( infd, buffer + hsize, buffer_size );
at_stream_end = ( size < buffer_size );
if( size != buffer_size && errno )
{ if( shared_retval.set_value( 1 ) )
{ pp(); show_error( "Read error", errno ); } break; }
}
delete[] base_buffer;
courier.finish( worker_id ); // no more packets to send
tmp.num_workers = worker_id;
return 0;
}
/* Get from courier the processed and sorted packets, and write their
contents to the output file. Drain queue on error.
*/
void muxer( Packet_courier & courier, const Pretty_print & pp,
Shared_retval & shared_retval, const int outfd )
{ {
while( true ) while( true )
{ {
Packet * const opacket = courier.deliver_packet(); const Packet * const opacket = courier.deliver_packet();
if( !opacket ) break; // queue is empty. all workers exited if( !opacket ) break; // queue is empty. all workers exited
const int wr = writeblock( outfd, opacket->data, opacket->size ); if( shared_retval() == 0 &&
if( wr != opacket->size ) writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); } shared_retval.set_value( 1 ) )
delete[] opacket->data; { pp(); show_error( "Write error", errno ); }
delete opacket; delete opacket;
} }
} }
@ -475,8 +562,9 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
} // end namespace } // end namespace
// init the courier, then start the splitter and the workers and, /* Init the courier, then start the splitter and the workers and, if not
// if not testing, call the muxer. testing, call the muxer.
*/
int dec_stream( const unsigned long long cfile_size, int dec_stream( const unsigned long long cfile_size,
const int num_workers, const int infd, const int outfd, const int num_workers, const int infd, const int outfd,
const Pretty_print & pp, const int debug_level, const Pretty_print & pp, const int debug_level,
@ -487,77 +575,76 @@ int dec_stream( const unsigned long long cfile_size,
num_workers * in_slots : INT_MAX; num_workers * in_slots : INT_MAX;
in_size = 0; in_size = 0;
out_size = 0; out_size = 0;
Packet_courier courier( num_workers, total_in_slots, out_slots ); Shared_retval shared_retval;
Packet_courier courier( shared_retval, num_workers, total_in_slots, out_slots );
Splitter_arg splitter_arg; if( debug_level & 2 ) std::fputs( "decompress stream.\n", stderr );
splitter_arg.cfile_size = cfile_size;
splitter_arg.courier = &courier;
splitter_arg.pp = &pp;
splitter_arg.infd = infd;
pthread_t splitter_thread;
int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg );
if( errcode )
{ show_error( "Can't create splitter thread", errcode ); cleanup_and_fail(); }
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
if( !worker_args || !worker_threads ) if( !worker_args || !worker_threads )
{ pp( "Not enough memory." ); cleanup_and_fail(); } { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
for( int i = 0; i < num_workers; ++i )
{ #if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
worker_args[i].courier = &courier; const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 );
worker_args[i].pp = &pp; #else
worker_args[i].worker_id = i; const bool nocopy = false;
worker_args[i].ignore_trailing = ignore_trailing; #endif
worker_args[i].loose_trailing = loose_trailing;
worker_args[i].testing = ( outfd < 0 ); Splitter_arg splitter_arg;
errcode = pthread_create( &worker_threads[i], 0, dworker_s, &worker_args[i] ); splitter_arg.worker_arg.courier = &courier;
splitter_arg.worker_arg.pp = &pp;
splitter_arg.worker_arg.shared_retval = &shared_retval;
splitter_arg.worker_arg.worker_id = 0;
splitter_arg.worker_arg.ignore_trailing = ignore_trailing;
splitter_arg.worker_arg.loose_trailing = loose_trailing;
splitter_arg.worker_arg.testing = ( outfd < 0 );
splitter_arg.worker_arg.nocopy = nocopy;
splitter_arg.worker_args = worker_args;
splitter_arg.worker_threads = worker_threads;
splitter_arg.cfile_size = cfile_size;
splitter_arg.infd = infd;
splitter_arg.num_workers = num_workers;
pthread_t splitter_thread;
int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg );
if( errcode ) if( errcode )
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } { show_error( "Can't create splitter thread", errcode );
} delete[] worker_threads; delete[] worker_args; return 1; }
if( outfd >= 0 ) muxer( courier, pp, outfd ); if( outfd >= 0 ) muxer( courier, pp, shared_retval, outfd );
for( int i = num_workers - 1; i >= 0; --i ) errcode = pthread_join( splitter_thread, 0 );
{ if( errcode && shared_retval.set_value( 1 ) )
show_error( "Can't join splitter thread", errcode );
for( int i = splitter_arg.num_workers; --i >= 0; )
{ // join only the workers started
errcode = pthread_join( worker_threads[i], 0 ); errcode = pthread_join( worker_threads[i], 0 );
if( errcode ) if( errcode && shared_retval.set_value( 1 ) )
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } show_error( "Can't join worker threads", errcode );
} }
delete[] worker_threads; delete[] worker_threads;
delete[] worker_args; delete[] worker_args;
errcode = pthread_join( splitter_thread, 0 ); if( shared_retval() ) return shared_retval(); // some thread found a problem
if( errcode )
{ show_error( "Can't join splitter thread", errcode ); cleanup_and_fail(); }
if( verbosity >= 2 ) show_results( in_size, out_size, splitter_arg.dictionary_size, outfd < 0 );
{
if( verbosity >= 4 ) show_header( splitter_arg.dictionary_size );
if( out_size == 0 || in_size == 0 )
std::fputs( "no data compressed. ", stderr );
else
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
(double)out_size / in_size,
( 100.0 * in_size ) / out_size,
100.0 - ( ( 100.0 * in_size ) / out_size ) );
if( verbosity >= 3 )
std::fprintf( stderr, "decompressed %9llu, compressed %8llu. ",
out_size, in_size );
}
if( verbosity >= 1 ) std::fputs( (outfd < 0) ? "ok\n" : "done\n", stderr );
if( debug_level & 1 ) if( debug_level & 1 )
{
std::fprintf( stderr, std::fprintf( stderr,
"workers started %8u\n"
"any worker tried to consume from splitter %8u times\n" "any worker tried to consume from splitter %8u times\n"
"any worker had to wait %8u times\n" "any worker had to wait %8u times\n",
splitter_arg.num_workers,
courier.icheck_counter, courier.iwait_counter );
if( outfd >= 0 )
std::fprintf( stderr,
"muxer tried to consume from workers %8u times\n" "muxer tried to consume from workers %8u times\n"
"muxer had to wait %8u times\n", "muxer had to wait %8u times\n",
courier.icheck_counter, courier.ocheck_counter, courier.owait_counter );
courier.iwait_counter, }
courier.ocheck_counter,
courier.owait_counter );
if( !courier.finished() ) internal_error( "courier not finished." ); if( !courier.finished() ) internal_error( "courier not finished." );
return 0; return 0;

View file

@ -1,6 +1,6 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009 Laszlo Ersek.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -27,7 +27,6 @@
#include <cstring> #include <cstring>
#include <string> #include <string>
#include <vector> #include <vector>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
#include <sys/stat.h> #include <sys/stat.h>
@ -37,8 +36,9 @@
#include "lzip_index.h" #include "lzip_index.h"
// This code is based on a patch by Hannes Domani, ssbssa@yahoo.de /* This code is based on a patch by Hannes Domani, <ssbssa@yahoo.de> to make
// to be able to compile plzip under MS Windows (with MINGW compiler). possible compiling plzip under MS Windows (with MINGW compiler).
*/
#if defined(__MSVCRT__) && defined(WITH_MINGW) #if defined(__MSVCRT__) && defined(WITH_MINGW)
#include <windows.h> #include <windows.h>
#warning "Parallel I/O is not guaranteed to work on Windows." #warning "Parallel I/O is not guaranteed to work on Windows."
@ -76,9 +76,9 @@ ssize_t pwrite( int fd, const void *buf, size_t count, uint64_t offset )
#endif // __MSVCRT__ #endif // __MSVCRT__
// Returns the number of bytes really read. /* Returns the number of bytes really read.
// If (returned value < size) and (errno == 0), means EOF was reached. If (returned value < size) and (errno == 0), means EOF was reached.
// */
int preadblock( const int fd, uint8_t * const buf, const int size, int preadblock( const int fd, uint8_t * const buf, const int size,
const long long pos ) const long long pos )
{ {
@ -96,9 +96,9 @@ int preadblock( const int fd, uint8_t * const buf, const int size,
} }
// Returns the number of bytes really written. /* Returns the number of bytes really written.
// If (returned value < size), it is always an error. If (returned value < size), it is always an error.
// */
int pwriteblock( const int fd, const uint8_t * const buf, const int size, int pwriteblock( const int fd, const uint8_t * const buf, const int size,
const long long pos ) const long long pos )
{ {
@ -115,18 +115,39 @@ int pwriteblock( const int fd, const uint8_t * const buf, const int size,
} }
int decompress_read_error( struct LZ_Decoder * const decoder, void decompress_error( struct LZ_Decoder * const decoder,
const Pretty_print & pp, const int worker_id ) const Pretty_print & pp,
Shared_retval & shared_retval, const int worker_id )
{ {
const LZ_Errno errcode = LZ_decompress_errno( decoder ); const LZ_Errno errcode = LZ_decompress_errno( decoder );
const int retval = ( errcode == LZ_header_error || errcode == LZ_data_error ||
errcode == LZ_unexpected_eof ) ? 2 : 1;
if( !shared_retval.set_value( retval ) ) return;
pp(); pp();
if( verbosity >= 0 ) if( verbosity >= 0 )
std::fprintf( stderr, "LZ_decompress_read error in worker %d: %s\n", std::fprintf( stderr, "%s in worker %d\n", LZ_strerror( errcode ),
worker_id, LZ_strerror( errcode ) ); worker_id );
if( errcode == LZ_header_error || errcode == LZ_unexpected_eof || }
errcode == LZ_data_error )
return 2;
return 1; void show_results( const unsigned long long in_size,
const unsigned long long out_size,
const unsigned dictionary_size, const bool testing )
{
if( verbosity >= 2 )
{
if( verbosity >= 4 ) show_header( dictionary_size );
if( out_size == 0 || in_size == 0 )
std::fputs( "no data compressed. ", stderr );
else
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
(double)out_size / in_size,
( 100.0 * in_size ) / out_size,
100.0 - ( ( 100.0 * in_size ) / out_size ) );
if( verbosity >= 3 )
std::fprintf( stderr, "%9llu out, %8llu in. ", out_size, in_size );
}
if( verbosity >= 1 ) std::fputs( testing ? "ok\n" : "done\n", stderr );
} }
@ -136,32 +157,38 @@ struct Worker_arg
{ {
const Lzip_index * lzip_index; const Lzip_index * lzip_index;
const Pretty_print * pp; const Pretty_print * pp;
Shared_retval * shared_retval;
int worker_id; int worker_id;
int num_workers; int num_workers;
int infd; int infd;
int outfd; int outfd;
bool nocopy; // avoid copying decompressed data when testing
}; };
// read members from file, decompress their contents, and /* Read members from input file, decompress their contents, and write to
// write the produced data to file. output file the data produced.
*/
extern "C" void * dworker( void * arg ) extern "C" void * dworker( void * arg )
{ {
const Worker_arg & tmp = *(const Worker_arg *)arg; const Worker_arg & tmp = *(const Worker_arg *)arg;
const Lzip_index & lzip_index = *tmp.lzip_index; const Lzip_index & lzip_index = *tmp.lzip_index;
const Pretty_print & pp = *tmp.pp; const Pretty_print & pp = *tmp.pp;
Shared_retval & shared_retval = *tmp.shared_retval;
const int worker_id = tmp.worker_id; const int worker_id = tmp.worker_id;
const int num_workers = tmp.num_workers; const int num_workers = tmp.num_workers;
const int infd = tmp.infd; const int infd = tmp.infd;
const int outfd = tmp.outfd; const int outfd = tmp.outfd;
const bool nocopy = tmp.nocopy;
const int buffer_size = 65536; const int buffer_size = 65536;
uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size]; uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size];
uint8_t * const obuffer = new( std::nothrow ) uint8_t[buffer_size]; uint8_t * const obuffer =
nocopy ? 0 : new( std::nothrow ) uint8_t[buffer_size];
LZ_Decoder * const decoder = LZ_decompress_open(); LZ_Decoder * const decoder = LZ_decompress_open();
if( !ibuffer || !obuffer || !decoder || if( !ibuffer || ( !nocopy && !obuffer ) || !decoder ||
LZ_decompress_errno( decoder ) != LZ_ok ) LZ_decompress_errno( decoder ) != LZ_ok )
{ pp( "Not enough memory." ); cleanup_and_fail(); } { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
for( long i = worker_id; i < lzip_index.members(); i += num_workers ) for( long i = worker_id; i < lzip_index.members(); i += num_workers )
{ {
@ -172,6 +199,7 @@ extern "C" void * dworker( void * arg )
while( member_rest > 0 ) while( member_rest > 0 )
{ {
if( shared_retval() ) goto done; // other worker found a problem
while( LZ_decompress_write_size( decoder ) > 0 ) while( LZ_decompress_write_size( decoder ) > 0 )
{ {
const int size = std::min( LZ_decompress_write_size( decoder ), const int size = std::min( LZ_decompress_write_size( decoder ),
@ -179,7 +207,8 @@ extern "C" void * dworker( void * arg )
if( size > 0 ) if( size > 0 )
{ {
if( preadblock( infd, ibuffer, size, member_pos ) != size ) if( preadblock( infd, ibuffer, size, member_pos ) != size )
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); } { if( shared_retval.set_value( 1 ) )
{ pp(); show_error( "Read error", errno ); } goto done; }
member_pos += size; member_pos += size;
member_rest -= size; member_rest -= size;
if( LZ_decompress_write( decoder, ibuffer, size ) != size ) if( LZ_decompress_write( decoder, ibuffer, size ) != size )
@ -191,17 +220,18 @@ extern "C" void * dworker( void * arg )
{ {
const int rd = LZ_decompress_read( decoder, obuffer, buffer_size ); const int rd = LZ_decompress_read( decoder, obuffer, buffer_size );
if( rd < 0 ) if( rd < 0 )
cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) ); { decompress_error( decoder, pp, shared_retval, worker_id );
goto done; }
if( rd > 0 && outfd >= 0 ) if( rd > 0 && outfd >= 0 )
{ {
const int wr = pwriteblock( outfd, obuffer, rd, data_pos ); const int wr = pwriteblock( outfd, obuffer, rd, data_pos );
if( wr != rd ) if( wr != rd )
{ {
pp(); if( shared_retval.set_value( 1 ) ) { pp();
if( verbosity >= 0 ) if( verbosity >= 0 )
std::fprintf( stderr, "Write error in worker %d: %s\n", std::fprintf( stderr, "Write error in worker %d: %s\n",
worker_id, std::strerror( errno ) ); worker_id, std::strerror( errno ) ); }
cleanup_and_fail(); goto done;
} }
} }
if( rd > 0 ) if( rd > 0 )
@ -221,98 +251,114 @@ extern "C" void * dworker( void * arg )
} }
show_progress( lzip_index.mblock( i ).size() ); show_progress( lzip_index.mblock( i ).size() );
} }
done:
delete[] obuffer; delete[] ibuffer; if( obuffer ) { delete[] obuffer; } delete[] ibuffer;
if( LZ_decompress_member_position( decoder ) != 0 ) if( LZ_decompress_member_position( decoder ) != 0 &&
{ pp( "Error, some data remains in decoder." ); cleanup_and_fail(); } shared_retval.set_value( 1 ) )
if( LZ_decompress_close( decoder ) < 0 ) pp( "Error, some data remains in decoder." );
{ pp( "LZ_decompress_close failed." ); cleanup_and_fail(); } if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
pp( "LZ_decompress_close failed." );
return 0; return 0;
} }
} // end namespace } // end namespace
// start the workers and wait for them to finish. // start the workers and wait for them to finish.
int decompress( const unsigned long long cfile_size, int num_workers, int decompress( const unsigned long long cfile_size, int num_workers,
const int infd, const int outfd, const Pretty_print & pp, const int infd, const int outfd, const Pretty_print & pp,
const int debug_level, const int in_slots, const int debug_level, const int in_slots,
const int out_slots, const bool ignore_trailing, const int out_slots, const bool ignore_trailing,
const bool loose_trailing, const bool infd_isreg ) const bool loose_trailing, const bool infd_isreg,
const bool one_to_one )
{ {
if( !infd_isreg ) if( !infd_isreg )
return dec_stream( cfile_size, num_workers, infd, outfd, pp, debug_level, return dec_stream( cfile_size, num_workers, infd, outfd, pp, debug_level,
in_slots, out_slots, ignore_trailing, loose_trailing ); in_slots, out_slots, ignore_trailing, loose_trailing );
const Lzip_index lzip_index( infd, ignore_trailing, loose_trailing ); const Lzip_index lzip_index( infd, ignore_trailing, loose_trailing );
if( lzip_index.retval() == 1 ) if( lzip_index.retval() == 1 ) // decompress as stream if seek fails
{ {
lseek( infd, 0, SEEK_SET ); lseek( infd, 0, SEEK_SET );
return dec_stream( cfile_size, num_workers, infd, outfd, pp, debug_level, return dec_stream( cfile_size, num_workers, infd, outfd, pp, debug_level,
in_slots, out_slots, ignore_trailing, loose_trailing ); in_slots, out_slots, ignore_trailing, loose_trailing );
} }
if( lzip_index.retval() != 0 ) if( lzip_index.retval() != 0 ) // corrupt or invalid input file
{ show_file_error( pp.name(), lzip_index.error().c_str() ); {
return lzip_index.retval(); } if( lzip_index.bad_magic() )
show_file_error( pp.name(), lzip_index.error().c_str() );
else pp( lzip_index.error().c_str() );
return lzip_index.retval();
}
if( num_workers > lzip_index.members() ) if( num_workers > lzip_index.members() ) num_workers = lzip_index.members();
num_workers = lzip_index.members();
if( verbosity >= 1 ) pp();
show_progress( 0, cfile_size, &pp ); // init
if( outfd >= 0 ) if( outfd >= 0 )
{ {
struct stat st; struct stat st;
if( fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) || if( !one_to_one || fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) ||
lseek( outfd, 0, SEEK_CUR ) < 0 ) lseek( outfd, 0, SEEK_CUR ) < 0 )
{
if( debug_level & 2 ) std::fputs( "decompress file to stdout.\n", stderr );
if( verbosity >= 1 ) pp();
show_progress( 0, cfile_size, &pp ); // init
return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots, return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots,
lzip_index ); lzip_index );
} }
}
if( debug_level & 2 ) std::fputs( "decompress file to file.\n", stderr );
if( verbosity >= 1 ) pp();
show_progress( 0, cfile_size, &pp ); // init
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
if( !worker_args || !worker_threads ) if( !worker_args || !worker_threads )
{ pp( "Not enough memory." ); cleanup_and_fail(); } { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
for( int i = 0; i < num_workers; ++i )
#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 );
#else
const bool nocopy = false;
#endif
Shared_retval shared_retval;
int i = 0; // number of workers started
for( ; i < num_workers; ++i )
{ {
worker_args[i].lzip_index = &lzip_index; worker_args[i].lzip_index = &lzip_index;
worker_args[i].pp = &pp; worker_args[i].pp = &pp;
worker_args[i].shared_retval = &shared_retval;
worker_args[i].worker_id = i; worker_args[i].worker_id = i;
worker_args[i].num_workers = num_workers; worker_args[i].num_workers = num_workers;
worker_args[i].infd = infd; worker_args[i].infd = infd;
worker_args[i].outfd = outfd; worker_args[i].outfd = outfd;
worker_args[i].nocopy = nocopy;
const int errcode = const int errcode =
pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] ); pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] );
if( errcode ) if( errcode )
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } { if( shared_retval.set_value( 1 ) )
{ show_error( "Can't create worker threads", errcode ); } break; }
} }
for( int i = num_workers - 1; i >= 0; --i ) while( --i >= 0 )
{ {
const int errcode = pthread_join( worker_threads[i], 0 ); const int errcode = pthread_join( worker_threads[i], 0 );
if( errcode ) if( errcode && shared_retval.set_value( 1 ) )
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } show_error( "Can't join worker threads", errcode );
} }
delete[] worker_threads; delete[] worker_threads;
delete[] worker_args; delete[] worker_args;
if( verbosity >= 2 ) if( shared_retval() ) return shared_retval(); // some thread found a problem
{
if( verbosity >= 4 ) show_header( lzip_index.dictionary_size( 0 ) ); if( verbosity >= 1 )
const unsigned long long in_size = lzip_index.cdata_size(); show_results( lzip_index.cdata_size(), lzip_index.udata_size(),
const unsigned long long out_size = lzip_index.udata_size(); lzip_index.dictionary_size(), outfd < 0 );
if( out_size == 0 || in_size == 0 )
std::fputs( "no data compressed. ", stderr ); if( debug_level & 1 )
else std::fprintf( stderr,
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ", "workers started %8u\n", num_workers );
(double)out_size / in_size,
( 100.0 * in_size ) / out_size,
100.0 - ( ( 100.0 * in_size ) / out_size ) );
if( verbosity >= 3 )
std::fprintf( stderr, "decompressed %9llu, compressed %8llu. ",
out_size, in_size );
}
if( verbosity >= 1 ) std::fputs( (outfd < 0) ? "ok\n" : "done\n", stderr );
return 0; return 0;
} }

View file

@ -1,5 +1,5 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16.
.TH PLZIP "1" "January 2019" "plzip 1.8" "User Commands" .TH PLZIP "1" "January 2021" "plzip 1.9" "User Commands"
.SH NAME .SH NAME
plzip \- reduces the size of files plzip \- reduces the size of files
.SH SYNOPSIS .SH SYNOPSIS
@ -7,22 +7,24 @@ plzip \- reduces the size of files
[\fI\,options\/\fR] [\fI\,files\/\fR] [\fI\,options\/\fR] [\fI\,files\/\fR]
.SH DESCRIPTION .SH DESCRIPTION
Plzip is a massively parallel (multi\-threaded) implementation of lzip, fully Plzip is a massively parallel (multi\-threaded) implementation of lzip, fully
compatible with lzip 1.4 or newer. Plzip uses the lzlib compression library. compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
.PP .PP
Lzip is a lossless data compressor with a user interface similar to the Lzip is a lossless data compressor with a user interface similar to the one
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov
or compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is chain\-Algorithm' (LZMA) stream format, chosen to maximize safety and
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 interoperability. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) or
from a data recovery perspective. Lzip has been designed, written and compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is
tested with great care to replace gzip and bzip2 as the standard intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
general\-purpose compressed format for unix\-like systems. a data recovery perspective. Lzip has been designed, written, and tested
with great care to replace gzip and bzip2 as the standard general\-purpose
compressed format for unix\-like systems.
.PP .PP
Plzip can compress/decompress large files on multiprocessor machines Plzip can compress/decompress large files on multiprocessor machines much
much faster than lzip, at the cost of a slightly reduced compression faster than lzip, at the cost of a slightly reduced compression ratio (0.4
ratio (0.4 to 2 percent larger compressed files). Note that the number to 2 percent larger compressed files). Note that the number of usable
of usable threads is limited by file size; on files larger than a few GB threads is limited by file size; on files larger than a few GB plzip can use
plzip can use hundreds of processors, but on files of only a few MB hundreds of processors, but on files of only a few MB plzip is no faster
plzip is no faster than lzip. than lzip.
.SH OPTIONS .SH OPTIONS
.TP .TP
\fB\-h\fR, \fB\-\-help\fR \fB\-h\fR, \fB\-\-help\fR
@ -62,7 +64,7 @@ set match length limit in bytes [36]
set number of (de)compression threads [2] set number of (de)compression threads [2]
.TP .TP
\fB\-o\fR, \fB\-\-output=\fR<file> \fB\-o\fR, \fB\-\-output=\fR<file>
if reading standard input, write to <file> write to <file>, keep input files
.TP .TP
\fB\-q\fR, \fB\-\-quiet\fR \fB\-q\fR, \fB\-\-quiet\fR
suppress all messages suppress all messages
@ -93,6 +95,9 @@ number of 1 MiB input packets buffered [4]
.TP .TP
\fB\-\-out\-slots=\fR<n> \fB\-\-out\-slots=\fR<n>
number of 1 MiB output packets buffered [64] number of 1 MiB output packets buffered [64]
.TP
\fB\-\-check\-lib\fR
compare version of lzlib.h with liblz.{a,so}
.PP .PP
If no file names are given, or if a file is '\-', plzip compresses or If no file names are given, or if a file is '\-', plzip compresses or
decompresses from standard input to standard output. decompresses from standard input to standard output.
@ -103,8 +108,11 @@ to 2^29 bytes.
.PP .PP
The bidimensional parameter space of LZMA can't be mapped to a linear The bidimensional parameter space of LZMA can't be mapped to a linear
scale optimal for all files. If your files are large, very repetitive, scale optimal for all files. If your files are large, very repetitive,
etc, you may need to use the \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR etc, you may need to use the options \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR
options directly to achieve optimal performance. directly to achieve optimal performance.
.PP
To extract all the files from archive 'foo.tar.lz', use the commands
\&'tar \fB\-xf\fR foo.tar.lz' or 'plzip \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'.
.PP .PP
Exit status: 0 for a normal exit, 1 for environmental problems (file Exit status: 0 for a normal exit, 1 for environmental problems (file
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
@ -117,8 +125,8 @@ Plzip home page: http://www.nongnu.org/lzip/plzip.html
.SH COPYRIGHT .SH COPYRIGHT
Copyright \(co 2009 Laszlo Ersek. Copyright \(co 2009 Laszlo Ersek.
.br .br
Copyright \(co 2019 Antonio Diaz Diaz. Copyright \(co 2021 Antonio Diaz Diaz.
Using lzlib 1.11 Using lzlib 1.12
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br .br
This is free software: you are free to change and redistribute it. This is free software: you are free to change and redistribute it.

View file

@ -11,7 +11,7 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
Plzip Manual Plzip Manual
************ ************
This manual is for Plzip (version 1.8, 5 January 2019). This manual is for Plzip (version 1.9, 3 January 2021).
* Menu: * Menu:
@ -28,10 +28,10 @@ This manual is for Plzip (version 1.8, 5 January 2019).
* Concept index:: Index of concepts * Concept index:: Index of concepts
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to This manual is free documentation: you have unlimited permission to copy,
copy, distribute and modify it. distribute, and modify it.
 
File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
@ -39,88 +39,89 @@ File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
1 Introduction 1 Introduction
************** **************
Plzip is a massively parallel (multi-threaded) implementation of lzip, Plzip is a massively parallel (multi-threaded) implementation of lzip, fully
fully compatible with lzip 1.4 or newer. Plzip uses the lzlib compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
compression library.
Lzip is a lossless data compressor with a user interface similar to Lzip is a lossless data compressor with a user interface similar to the
the one of gzip or bzip2. Lzip can compress about as fast as gzip one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
(lzip -0) or compress most files more than bzip2 (lzip -9). chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
Decompression speed is intermediate between gzip and bzip2. Lzip is interoperability. Lzip can compress about as fast as gzip (lzip -0) or
better than gzip and bzip2 from a data recovery perspective. Lzip has compress most files more than bzip2 (lzip -9). Decompression speed is
been designed, written and tested with great care to replace gzip and intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
bzip2 as the standard general-purpose compressed format for unix-like a data recovery perspective. Lzip has been designed, written, and tested
systems. with great care to replace gzip and bzip2 as the standard general-purpose
compressed format for unix-like systems.
Plzip can compress/decompress large files on multiprocessor machines Plzip can compress/decompress large files on multiprocessor machines much
much faster than lzip, at the cost of a slightly reduced compression faster than lzip, at the cost of a slightly reduced compression ratio (0.4
ratio (0.4 to 2 percent larger compressed files). Note that the number to 2 percent larger compressed files). Note that the number of usable
of usable threads is limited by file size; on files larger than a few GB threads is limited by file size; on files larger than a few GB plzip can use
plzip can use hundreds of processors, but on files of only a few MB hundreds of processors, but on files of only a few MB plzip is no faster
plzip is no faster than lzip. *Note Minimum file sizes::. than lzip. *Note Minimum file sizes::.
For creation and manipulation of compressed tar archives tarlz can be
more efficient than using tar and plzip because tarlz is able to keep the
alignment between tar members and lzip members. *Note tarlz manual:
(tarlz)Top.
The lzip file format is designed for data sharing and long-term The lzip file format is designed for data sharing and long-term
archiving, taking into account both data integrity and decoder archiving, taking into account both data integrity and decoder availability:
availability:
* The lzip format provides very safe integrity checking and some data * The lzip format provides very safe integrity checking and some data
recovery means. The lziprecover program can repair bit flip errors recovery means. The program lziprecover can repair bit flip errors
(one of the most common forms of data corruption) in lzip files, (one of the most common forms of data corruption) in lzip files, and
and provides data recovery capabilities, including error-checked provides data recovery capabilities, including error-checked merging
merging of damaged copies of a file. *Note Data safety: of damaged copies of a file. *Note Data safety: (lziprecover)Data
(lziprecover)Data safety. safety.
* The lzip format is as simple as possible (but not simpler). The * The lzip format is as simple as possible (but not simpler). The lzip
lzip manual provides the source code of a simple decompressor manual provides the source code of a simple decompressor along with a
along with a detailed explanation of how it works, so that with detailed explanation of how it works, so that with the only help of the
the only help of the lzip manual it would be possible for a lzip manual it would be possible for a digital archaeologist to extract
digital archaeologist to extract the data from a lzip file long the data from a lzip file long after quantum computers eventually
after quantum computers eventually render LZMA obsolete. render LZMA obsolete.
* Additionally the lzip reference implementation is copylefted, which * Additionally the lzip reference implementation is copylefted, which
guarantees that it will remain free forever. guarantees that it will remain free forever.
A nice feature of the lzip format is that a corrupt byte is easier to A nice feature of the lzip format is that a corrupt byte is easier to
repair the nearer it is from the beginning of the file. Therefore, with repair the nearer it is from the beginning of the file. Therefore, with the
the help of lziprecover, losing an entire archive just because of a help of lziprecover, losing an entire archive just because of a corrupt
corrupt byte near the beginning is a thing of the past. byte near the beginning is a thing of the past.
Plzip uses the same well-defined exit status values used by lzip, Plzip uses the same well-defined exit status values used by lzip, which
which makes it safer than compressors returning ambiguous warning makes it safer than compressors returning ambiguous warning values (like
values (like gzip) when it is used as a back end for other programs gzip) when it is used as a back end for other programs like tar or zutils.
like tar or zutils.
Plzip will automatically use for each file the largest dictionary Plzip will automatically use for each file the largest dictionary size
size that does not exceed neither the file size nor the limit given. that does not exceed neither the file size nor the limit given. Keep in
Keep in mind that the decompression memory requirement is affected at mind that the decompression memory requirement is affected at compression
compression time by the choice of dictionary size limit. *Note Memory time by the choice of dictionary size limit. *Note Memory requirements::.
requirements::.
When compressing, plzip replaces every file given in the command line When compressing, plzip replaces every file given in the command line
with a compressed version of itself, with the name "original_name.lz". with a compressed version of itself, with the name "original_name.lz". When
When decompressing, plzip attempts to guess the name for the decompressing, plzip attempts to guess the name for the decompressed file
decompressed file from that of the compressed file as follows: from that of the compressed file as follows:
filename.lz becomes filename filename.lz becomes filename
filename.tlz becomes filename.tar filename.tlz becomes filename.tar
anyothername becomes anyothername.out anyothername becomes anyothername.out
(De)compressing a file is much like copying or moving it; therefore (De)compressing a file is much like copying or moving it; therefore plzip
plzip preserves the access and modification dates, permissions, and, preserves the access and modification dates, permissions, and, when
when possible, ownership of the file just as 'cp -p' does. (If the user possible, ownership of the file just as 'cp -p' does. (If the user ID or
ID or the group ID can't be duplicated, the file permission bits the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISUID and S_ISGID are cleared). S_ISGID are cleared).
Plzip is able to read from some types of non regular files if the Plzip is able to read from some types of non-regular files if either the
'--stdout' option is specified. option '-c' or the option '-o' is specified.
If no file names are specified, plzip compresses (or decompresses) Plzip will refuse to read compressed data from a terminal or write
from standard input to standard output. In this case, plzip will compressed data to a terminal, as this would be entirely incomprehensible
decline to write compressed output to a terminal, as this would be and might leave the terminal in an abnormal state.
entirely incomprehensible and therefore pointless.
Plzip will correctly decompress a file which is the concatenation of Plzip will correctly decompress a file which is the concatenation of two
two or more compressed files. The result is the concatenation of the or more compressed files. The result is the concatenation of the
corresponding decompressed files. Integrity testing of concatenated corresponding decompressed files. Integrity testing of concatenated
compressed files is also supported. compressed files is also supported.
@ -135,41 +136,40 @@ The output of plzip looks like this:
plzip -v foo plzip -v foo
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
plzip -tvv foo.lz plzip -tvvv foo.lz
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok
The meaning of each field is as follows: The meaning of each field is as follows:
'N:1' 'N:1'
The compression ratio (uncompressed_size / compressed_size), shown The compression ratio (uncompressed_size / compressed_size), shown as
as N to 1. N to 1.
'ratio' 'ratio'
The inverse compression ratio The inverse compression ratio (compressed_size / uncompressed_size),
(compressed_size / uncompressed_size), shown as a percentage. A shown as a percentage. A decimal ratio is easily obtained by moving the
decimal ratio is easily obtained by moving the decimal point two decimal point two places to the left; 14.98% = 0.1498.
places to the left; 14.98% = 0.1498.
'saved' 'saved'
The space saved by compression (1 - ratio), shown as a percentage. The space saved by compression (1 - ratio), shown as a percentage.
'in' 'in'
The size of the uncompressed data. When decompressing or testing, Size of the input data. This is the uncompressed size when
it is shown as 'decompressed'. Note that plzip always prints the compressing, or the compressed size when decompressing or testing.
uncompressed size before the compressed size when compressing, Note that plzip always prints the uncompressed size before the
decompressing, testing or listing. compressed size when compressing, decompressing, testing, or listing.
'out' 'out'
The size of the compressed data. When decompressing or testing, it Size of the output data. This is the compressed size when compressing,
is shown as 'compressed'. or the decompressed size when decompressing or testing.
When decompressing or testing at verbosity level 4 (-vvvv), the When decompressing or testing at verbosity level 4 (-vvvv), the
dictionary size used to compress the file is also shown. dictionary size used to compress the file is also shown.
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
never have been compressed. Decompressed is used to refer to data which have been compressed. Decompressed is used to refer to data which have
have undergone the process of decompression. undergone the process of decompression.
 
File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top
@ -181,11 +181,13 @@ The format for running plzip is:
plzip [OPTIONS] [FILES] plzip [OPTIONS] [FILES]
'-' used as a FILE argument means standard input. It can be mixed with If no file names are specified, plzip compresses (or decompresses) from
other FILES and is read just once, the first time it appears in the standard input to standard output. A hyphen '-' used as a FILE argument
command line. means standard input. It can be mixed with other FILES and is read just
once, the first time it appears in the command line.
plzip supports the following options: plzip supports the following options: *Note Argument syntax:
(arg_parser)Argument syntax.
'-h' '-h'
'--help' '--help'
@ -199,32 +201,33 @@ command line.
'-a' '-a'
'--trailing-error' '--trailing-error'
Exit with error status 2 if any remaining input is detected after Exit with error status 2 if any remaining input is detected after
decompressing the last member. Such remaining input is usually decompressing the last member. Such remaining input is usually trailing
trailing garbage that can be safely ignored. *Note garbage that can be safely ignored. *Note concat-example::.
concat-example::.
'-B BYTES' '-B BYTES'
'--data-size=BYTES' '--data-size=BYTES'
When compressing, set the size of the input data blocks in bytes. When compressing, set the size of the input data blocks in bytes. The
The input file will be divided in chunks of this size before input file will be divided in chunks of this size before compression is
compression is performed. Valid values range from 8 KiB to 1 GiB. performed. Valid values range from 8 KiB to 1 GiB. Default value is
Default value is two times the dictionary size, except for option two times the dictionary size, except for option '-0' where it
'-0' where it defaults to 1 MiB. Plzip will reduce the dictionary defaults to 1 MiB. Plzip will reduce the dictionary size if it is
size if it is larger than the chosen data size. larger than the data size specified. *Note Minimum file sizes::.
'-c' '-c'
'--stdout' '--stdout'
Compress or decompress to standard output; keep input files Compress or decompress to standard output; keep input files unchanged.
unchanged. If compressing several files, each file is compressed If compressing several files, each file is compressed independently.
independently. This option is needed when reading from a named This option (or '-o') is needed when reading from a named pipe (fifo)
pipe (fifo) or from a device. or from a device. Use 'lziprecover -cd -i' to recover as much of the
decompressed data as possible when decompressing a corrupt file. '-c'
overrides '-o'. '-c' has no effect when testing or listing.
'-d' '-d'
'--decompress' '--decompress'
Decompress the specified files. If a file does not exist or can't Decompress the files specified. If a file does not exist or can't be
be opened, plzip continues decompressing the rest of the files. If opened, plzip continues decompressing the rest of the files. If a file
a file fails to decompress, or is a terminal, plzip exits fails to decompress, or is a terminal, plzip exits immediately without
immediately without decompressing the rest of the files. decompressing the rest of the files.
'-f' '-f'
'--force' '--force'
@ -232,59 +235,69 @@ command line.
'-F' '-F'
'--recompress' '--recompress'
When compressing, force re-compression of files whose name already When compressing, force re-compression of files whose name already has
has the '.lz' or '.tlz' suffix. the '.lz' or '.tlz' suffix.
'-k' '-k'
'--keep' '--keep'
Keep (don't delete) input files during compression or Keep (don't delete) input files during compression or decompression.
decompression.
'-l' '-l'
'--list' '--list'
Print the uncompressed size, compressed size and percentage saved Print the uncompressed size, compressed size, and percentage saved of
of the specified files. Trailing data are ignored. The values the files specified. Trailing data are ignored. The values produced
produced are correct even for multimember files. If more than one are correct even for multimember files. If more than one file is
file is given, a final line containing the cumulative sizes is given, a final line containing the cumulative sizes is printed. With
printed. With '-v', the dictionary size, the number of members in '-v', the dictionary size, the number of members in the file, and the
the file, and the amount of trailing data (if any) are also amount of trailing data (if any) are also printed. With '-vv', the
printed. With '-vv', the positions and sizes of each member in positions and sizes of each member in multimember files are also
multimember files are also printed. '-lq' can be used to verify printed.
quickly (without decompressing) the structural integrity of the
specified files. (Use '--test' to verify the data integrity). '-lq' can be used to verify quickly (without decompressing) the
'-alq' additionally verifies that none of the specified files structural integrity of the files specified. (Use '--test' to verify
contain trailing data. the data integrity). '-alq' additionally verifies that none of the
files specified contain trailing data.
'-m BYTES' '-m BYTES'
'--match-length=BYTES' '--match-length=BYTES'
When compressing, set the match length limit in bytes. After a When compressing, set the match length limit in bytes. After a match
match this long is found, the search is finished. Valid values this long is found, the search is finished. Valid values range from 5
range from 5 to 273. Larger values usually give better compression to 273. Larger values usually give better compression ratios but longer
ratios but longer compression times. compression times.
'-n N' '-n N'
'--threads=N' '--threads=N'
Set the number of worker threads, overriding the system's default. Set the maximum number of worker threads, overriding the system's
Valid values range from 1 to "as many as your system can support". default. Valid values range from 1 to "as many as your system can
If this option is not used, plzip tries to detect the number of support". If this option is not used, plzip tries to detect the number
processors in the system and use it as default value. When of processors in the system and use it as default value. When
compressing on a 32 bit system, plzip tries to limit the memory compressing on a 32 bit system, plzip tries to limit the memory use to
use to under 2.22 GiB (4 worker threads at level -9) by reducing under 2.22 GiB (4 worker threads at level -9) by reducing the number
the number of threads below the system's default. 'plzip --help' of threads below the system's default. 'plzip --help' shows the
shows the system's default value. system's default value.
Note that the number of usable threads is limited to Plzip starts the number of threads required by each file without
ceil( file_size / data_size ) during compression (*note Minimum exceeding the value specified. Note that the number of usable threads
file sizes::), and to the number of members in the input during is limited to ceil( file_size / data_size ) during compression (*note
decompression. Minimum file sizes::), and to the number of members in the input
during decompression. You can find the number of members in a lzip
file by running 'plzip -lv file.lz'.
'-o FILE' '-o FILE'
'--output=FILE' '--output=FILE'
When reading from standard input and '--stdout' has not been If '-c' has not been also specified, write the (de)compressed output to
specified, use 'FILE' as the virtual name of the uncompressed FILE; keep input files unchanged. If compressing several files, each
file. This produces a file named 'FILE' when decompressing, or a file is compressed independently. This option (or '-c') is needed when
file named 'FILE.lz' when compressing. A second '.lz' extension is reading from a named pipe (fifo) or from a device. '-o -' is
not added if 'FILE' already ends in '.lz' or '.tlz'. equivalent to '-c'. '-o' has no effect when testing or listing.
In order to keep backward compatibility with plzip versions prior to
1.9, when compressing from standard input and no other file names are
given, the extension '.lz' is appended to FILE unless it already ends
in '.lz' or '.tlz'. This feature will be removed in a future version
of plzip. Meanwhile, redirection may be used instead of '-o' to write
the compressed output to a file without the extension '.lz' in its
name: 'plzip < file > foo'.
'-q' '-q'
'--quiet' '--quiet'
@ -292,30 +305,28 @@ command line.
'-s BYTES' '-s BYTES'
'--dictionary-size=BYTES' '--dictionary-size=BYTES'
When compressing, set the dictionary size limit in bytes. Plzip When compressing, set the dictionary size limit in bytes. Plzip will
will use for each file the largest dictionary size that does not use for each file the largest dictionary size that does not exceed
exceed neither the file size nor this limit. Valid values range neither the file size nor this limit. Valid values range from 4 KiB to
from 4 KiB to 512 MiB. Values 12 to 29 are interpreted as powers 512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
of two, meaning 2^12 to 2^29 bytes. Dictionary sizes are quantized 2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
so that they can be coded in just one byte (*note coded in just one byte (*note coded-dict-size::). If the size specified
coded-dict-size::). If the specified size does not match one of does not match one of the valid sizes, it will be rounded upwards by
the valid sizes, it will be rounded upwards by adding up to adding up to (BYTES / 8) to it.
(BYTES / 8) to it.
For maximum compression you should use a dictionary size limit as For maximum compression you should use a dictionary size limit as large
large as possible, but keep in mind that the decompression memory as possible, but keep in mind that the decompression memory requirement
requirement is affected at compression time by the choice of is affected at compression time by the choice of dictionary size limit.
dictionary size limit.
'-t' '-t'
'--test' '--test'
Check integrity of the specified files, but don't decompress them. Check integrity of the files specified, but don't decompress them. This
This really performs a trial decompression and throws away the really performs a trial decompression and throws away the result. Use
result. Use it together with '-v' to see information about the it together with '-v' to see information about the files. If a file
files. If a file does not exist, can't be opened, or is a fails the test, does not exist, can't be opened, or is a terminal,
terminal, plzip continues checking the rest of the files. If a plzip continues checking the rest of the files. A final diagnostic is
file fails the test, plzip may be unable to check the rest of the shown at verbosity level 1 or higher if any file fails the test when
files. testing multiple files.
'-v' '-v'
'--verbose' '--verbose'
@ -323,26 +334,26 @@ command line.
When compressing, show the compression ratio and size for each file When compressing, show the compression ratio and size for each file
processed. processed.
When decompressing or testing, further -v's (up to 4) increase the When decompressing or testing, further -v's (up to 4) increase the
verbosity level, showing status, compression ratio, dictionary verbosity level, showing status, compression ratio, dictionary size,
size, decompressed size, and compressed size. decompressed size, and compressed size.
Two or more '-v' options show the progress of (de)compression, Two or more '-v' options show the progress of (de)compression, except
except for single-member files. for single-member files.
'-0 .. -9' '-0 .. -9'
Compression level. Set the compression parameters (dictionary size Compression level. Set the compression parameters (dictionary size and
and match length limit) as shown in the table below. The default match length limit) as shown in the table below. The default
compression level is '-6', equivalent to '-s8MiB -m36'. Note that compression level is '-6', equivalent to '-s8MiB -m36'. Note that '-9'
'-9' can be much slower than '-0'. These options have no effect can be much slower than '-0'. These options have no effect when
when decompressing, testing or listing. decompressing, testing, or listing.
The bidimensional parameter space of LZMA can't be mapped to a The bidimensional parameter space of LZMA can't be mapped to a linear
linear scale optimal for all files. If your files are large, very scale optimal for all files. If your files are large, very repetitive,
repetitive, etc, you may need to use the '--dictionary-size' and etc, you may need to use the options '--dictionary-size' and
'--match-length' options directly to achieve optimal performance. '--match-length' directly to achieve optimal performance.
If several compression levels or '-s' or '-m' options are given, If several compression levels or '-s' or '-m' options are given, the
the last setting is used. For example '-9 -s64MiB' is equivalent last setting is used. For example '-9 -s64MiB' is equivalent to
to '-s64MiB -m273' '-s64MiB -m273'
Level Dictionary size (-s) Match length limit (-m) Level Dictionary size (-s) Match length limit (-m)
-0 64 KiB 16 bytes -0 64 KiB 16 bytes
@ -361,23 +372,33 @@ command line.
Aliases for GNU gzip compatibility. Aliases for GNU gzip compatibility.
'--loose-trailing' '--loose-trailing'
When decompressing, testing or listing, allow trailing data whose When decompressing, testing, or listing, allow trailing data whose
first bytes are so similar to the magic bytes of a lzip header first bytes are so similar to the magic bytes of a lzip header that
that they can be confused with a corrupt header. Use this option they can be confused with a corrupt header. Use this option if a file
if a file triggers a "corrupt header" error and the cause is not triggers a "corrupt header" error and the cause is not indeed a
indeed a corrupt header. corrupt header.
'--in-slots=N' '--in-slots=N'
Number of 1 MiB input packets buffered per worker thread when Number of 1 MiB input packets buffered per worker thread when
decompressing from non-seekable input. Increasing the number of decompressing from non-seekable input. Increasing the number of packets
packets may increase decompression speed, but requires more may increase decompression speed, but requires more memory. Valid
memory. Valid values range from 1 to 64. The default value is 4. values range from 1 to 64. The default value is 4.
'--out-slots=N' '--out-slots=N'
Number of 1 MiB output packets buffered per worker thread when Number of 1 MiB output packets buffered per worker thread when
decompressing to non-seekable output. Increasing the number of decompressing to non-seekable output. Increasing the number of packets
packets may increase decompression speed, but requires more may increase decompression speed, but requires more memory. Valid
memory. Valid values range from 1 to 1024. The default value is 64. values range from 1 to 1024. The default value is 64.
'--check-lib'
Compare the version of lzlib used to compile plzip with the version
actually being used at run time and exit. Report any differences
found. Exit with error status 1 if differences are found. A mismatch
may indicate that lzlib is not correctly installed or that a different
version of lzlib has been installed after compiling plzip.
'plzip -v --check-lib' shows the version of lzlib being used and the
value of 'LZ_API_VERSION' (if defined). *Note Library version:
(lzlib)Library version.
Numbers given as arguments to options may be followed by a multiplier Numbers given as arguments to options may be followed by a multiplier
@ -396,36 +417,36 @@ Z zettabyte (10^21) | Zi zebibyte (2^70)
Y yottabyte (10^24) | Yi yobibyte (2^80) Y yottabyte (10^24) | Yi yobibyte (2^80)
Exit status: 0 for a normal exit, 1 for environmental problems (file Exit status: 0 for a normal exit, 1 for environmental problems (file not
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
invalid input file, 3 for an internal consistency error (eg, bug) which input file, 3 for an internal consistency error (eg, bug) which caused
caused plzip to panic. plzip to panic.
 
File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top
4 Program design 4 Internal structure of plzip
**************** *****************************
When compressing, plzip divides the input file into chunks and When compressing, plzip divides the input file into chunks and compresses as
compresses as many chunks simultaneously as worker threads are chosen, many chunks simultaneously as worker threads are chosen, creating a
creating a multimember compressed file. multimember compressed file.
When decompressing, plzip decompresses as many members When decompressing, plzip decompresses as many members simultaneously as
simultaneously as worker threads are chosen. Files that were compressed worker threads are chosen. Files that were compressed with lzip will not be
with lzip will not be decompressed faster than using lzip (unless the decompressed faster than using lzip (unless the option '-b' was used)
'-b' option was used) because lzip usually produces single-member because lzip usually produces single-member files, which can't be
files, which can't be decompressed in parallel. decompressed in parallel.
For each input file, a splitter thread and several worker threads are For each input file, a splitter thread and several worker threads are
created, acting the main thread as muxer (multiplexer) thread. A "packet created, acting the main thread as muxer (multiplexer) thread. A "packet
courier" takes care of data transfers among threads and limits the courier" takes care of data transfers among threads and limits the maximum
maximum number of data blocks (packets) being processed simultaneously. number of data blocks (packets) being processed simultaneously.
The splitter reads data blocks from the input file, and distributes The splitter reads data blocks from the input file, and distributes them
them to the workers. The workers (de)compress the blocks received from to the workers. The workers (de)compress the blocks received from the
the splitter. The muxer collects processed packets from the workers, and splitter. The muxer collects processed packets from the workers, and writes
writes them to the output file. them to the output file.
,------------, ,------------,
,-->| worker 0 |--, ,-->| worker 0 |--,
@ -438,13 +459,12 @@ writes them to the output file.
`-->| worker N-1 |--' `-->| worker N-1 |--'
`------------' `------------'
When decompressing from a regular file, the splitter is removed and When decompressing from a regular file, the splitter is removed and the
the workers read directly from the input file. If the output file is workers read directly from the input file. If the output file is also a
also a regular file, the muxer is also removed and the workers write regular file, the muxer is also removed and the workers write directly to
directly to the output file. With these optimizations, the use of RAM the output file. With these optimizations, the use of RAM is greatly
is greatly reduced and the decompression speed of large files with many reduced and the decompression speed of large files with many members is
members is only limited by the number of processors available and by only limited by the number of processors available and by I/O speed.
I/O speed.
 
File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top
@ -458,11 +478,13 @@ when there is no longer anything to take away.
In the diagram below, a box like this: In the diagram below, a box like this:
+---+ +---+
| | <-- the vertical bars might be missing | | <-- the vertical bars might be missing
+---+ +---+
represents one byte; a box like this: represents one byte; a box like this:
+==============+ +==============+
| | | |
+==============+ +==============+
@ -471,10 +493,11 @@ when there is no longer anything to take away.
A lzip file consists of a series of "members" (compressed data sets). A lzip file consists of a series of "members" (compressed data sets).
The members simply appear one after another in the file, with no The members simply appear one after another in the file, with no additional
additional information before, between, or after them. information before, between, or after them.
Each member has the following structure: Each member has the following structure:
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | | ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@ -482,17 +505,16 @@ additional information before, between, or after them.
All multibyte values are stored in little endian order. All multibyte values are stored in little endian order.
'ID string (the "magic" bytes)' 'ID string (the "magic" bytes)'
A four byte string, identifying the lzip format, with the value A four byte string, identifying the lzip format, with the value "LZIP"
"LZIP" (0x4C, 0x5A, 0x49, 0x50). (0x4C, 0x5A, 0x49, 0x50).
'VN (version number, 1 byte)' 'VN (version number, 1 byte)'
Just in case something needs to be modified in the future. 1 for Just in case something needs to be modified in the future. 1 for now.
now.
'DS (coded dictionary size, 1 byte)' 'DS (coded dictionary size, 1 byte)'
The dictionary size is calculated by taking a power of 2 (the base The dictionary size is calculated by taking a power of 2 (the base
size) and subtracting from it a fraction between 0/16 and 7/16 of size) and subtracting from it a fraction between 0/16 and 7/16 of the
the base size. base size.
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29). Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
from the base size to obtain the dictionary size. from the base size to obtain the dictionary size.
@ -505,16 +527,16 @@ additional information before, between, or after them.
format, for a complete description. format, for a complete description.
'CRC32 (4 bytes)' 'CRC32 (4 bytes)'
CRC of the uncompressed original data. Cyclic Redundancy Check (CRC) of the uncompressed original data.
'Data size (8 bytes)' 'Data size (8 bytes)'
Size of the uncompressed original data. Size of the uncompressed original data.
'Member size (8 bytes)' 'Member size (8 bytes)'
Total size of the member, including header and trailer. This field Total size of the member, including header and trailer. This field acts
acts as a distributed index, allows the verification of stream as a distributed index, allows the verification of stream integrity,
integrity, and facilitates safe recovery of undamaged members from and facilitates safe recovery of undamaged members from multimember
multimember files. files.
 
@ -526,20 +548,20 @@ File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev:
The amount of memory required *per worker thread* for decompression or The amount of memory required *per worker thread* for decompression or
testing is approximately the following: testing is approximately the following:
* For decompression of a regular (seekable) file to another regular * For decompression of a regular (seekable) file to another regular file,
file, or for testing of a regular file; the dictionary size. or for testing of a regular file; the dictionary size.
* For testing of a non-seekable file or of standard input; the * For testing of a non-seekable file or of standard input; the dictionary
dictionary size plus 1 MiB plus up to the number of 1 MiB input size plus 1 MiB plus up to the number of 1 MiB input packets buffered
packets buffered (4 by default). (4 by default).
* For decompression of a regular file to a non-seekable file or to * For decompression of a regular file to a non-seekable file or to
standard output; the dictionary size plus up to the number of 1 MiB standard output; the dictionary size plus up to the number of 1 MiB
output packets buffered (64 by default). output packets buffered (64 by default).
* For decompression of a non-seekable file or of standard input; the * For decompression of a non-seekable file or of standard input; the
dictionary size plus 1 MiB plus up to the number of 1 MiB input dictionary size plus 1 MiB plus up to the number of 1 MiB input and
and output packets buffered (68 by default). output packets buffered (68 by default).
The amount of memory required *per worker thread* for compression is The amount of memory required *per worker thread* for compression is
approximately the following: approximately the following:
@ -550,9 +572,8 @@ approximately the following:
* For compression at other levels; 11 times the dictionary size plus * For compression at other levels; 11 times the dictionary size plus
3.375 times the data size. Default is 142 MiB. 3.375 times the data size. Default is 142 MiB.
The following table shows the memory required *per thread* for The following table shows the memory required *per thread* for compression
compression at a given level, using the default data size for each at a given level, using the default data size for each level:
level:
Level Memory required Level Memory required
-0 4.875 MiB -0 4.875 MiB
@ -572,22 +593,22 @@ File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory
7 Minimum file sizes required for full compression speed 7 Minimum file sizes required for full compression speed
******************************************************** ********************************************************
When compressing, plzip divides the input file into chunks and When compressing, plzip divides the input file into chunks and compresses
compresses as many chunks simultaneously as worker threads are chosen, as many chunks simultaneously as worker threads are chosen, creating a
creating a multimember compressed file. multimember compressed file.
For this to work as expected (and roughly multiply the compression For this to work as expected (and roughly multiply the compression speed
speed by the number of available processors), the uncompressed file by the number of available processors), the uncompressed file must be at
must be at least as large as the number of worker threads times the least as large as the number of worker threads times the chunk size (*note
chunk size (*note --data-size::). Else some processors will not get any --data-size::). Else some processors will not get any data to compress, and
data to compress, and compression will be proportionally slower. The compression will be proportionally slower. The maximum speed increase
maximum speed increase achievable on a given file is limited by the achievable on a given file is limited by the ratio (file_size / data_size).
ratio (file_size / data_size). For example, a tarball the size of gcc or For example, a tarball the size of gcc or linux will scale up to 10 or 14
linux will scale up to 8 processors at level -9. processors at level -9.
The following table shows the minimum uncompressed file size needed The following table shows the minimum uncompressed file size needed for
for full use of N processors at a given compression level, using the full use of N processors at a given compression level, using the default
default data size for each level: data size for each level:
Processors 2 4 8 16 64 256 Processors 2 4 8 16 64 256
------------------------------------------------------------------ ------------------------------------------------------------------
@ -612,43 +633,40 @@ File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file siz
Sometimes extra data are found appended to a lzip file after the last Sometimes extra data are found appended to a lzip file after the last
member. Such trailing data may be: member. Such trailing data may be:
* Padding added to make the file size a multiple of some block size, * Padding added to make the file size a multiple of some block size, for
for example when writing to a tape. It is safe to append any example when writing to a tape. It is safe to append any amount of
amount of padding zero bytes to a lzip file. padding zero bytes to a lzip file.
* Useful data added by the user; a cryptographically secure hash, a * Useful data added by the user; a cryptographically secure hash, a
description of file contents, etc. It is safe to append any amount description of file contents, etc. It is safe to append any amount of
of text to a lzip file as long as none of the first four bytes of text to a lzip file as long as none of the first four bytes of the text
the text match the corresponding byte in the string "LZIP", and match the corresponding byte in the string "LZIP", and the text does
the text does not contain any zero bytes (null characters). not contain any zero bytes (null characters). Nonzero bytes and zero
Nonzero bytes and zero bytes can't be safely mixed in trailing bytes can't be safely mixed in trailing data.
data.
* Garbage added by some not totally successful copy operation. * Garbage added by some not totally successful copy operation.
* Malicious data added to the file in order to make its total size * Malicious data added to the file in order to make its total size and
and hash value (for a chosen hash) coincide with those of another hash value (for a chosen hash) coincide with those of another file.
file.
* In rare cases, trailing data could be the corrupt header of another * In rare cases, trailing data could be the corrupt header of another
member. In multimember or concatenated files the probability of member. In multimember or concatenated files the probability of
corruption happening in the magic bytes is 5 times smaller than the corruption happening in the magic bytes is 5 times smaller than the
probability of getting a false positive caused by the corruption probability of getting a false positive caused by the corruption of the
of the integrity information itself. Therefore it can be integrity information itself. Therefore it can be considered to be
considered to be below the noise level. Additionally, the test below the noise level. Additionally, the test used by plzip to
used by plzip to discriminate trailing data from a corrupt header discriminate trailing data from a corrupt header has a Hamming
has a Hamming distance (HD) of 3, and the 3 bit flips must happen distance (HD) of 3, and the 3 bit flips must happen in different magic
in different magic bytes for the test to fail. In any case, the bytes for the test to fail. In any case, the option '--trailing-error'
option '--trailing-error' guarantees that any corrupt header will guarantees that any corrupt header will be detected.
be detected.
Trailing data are in no way part of the lzip file format, but tools Trailing data are in no way part of the lzip file format, but tools
reading lzip files are expected to behave as correctly and usefully as reading lzip files are expected to behave as correctly and usefully as
possible in the presence of trailing data. possible in the presence of trailing data.
Trailing data can be safely ignored in most cases. In some cases, Trailing data can be safely ignored in most cases. In some cases, like
like that of user-added data, they are expected to be ignored. In those that of user-added data, they are expected to be ignored. In those cases
cases where a file containing trailing data must be rejected, the option where a file containing trailing data must be rejected, the option
'--trailing-error' can be used. *Note --trailing-error::. '--trailing-error' can be used. *Note --trailing-error::.
 
@ -660,62 +678,70 @@ File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: T
WARNING! Even if plzip is bug-free, other causes may result in a corrupt WARNING! Even if plzip is bug-free, other causes may result in a corrupt
compressed file (bugs in the system libraries, memory errors, etc). compressed file (bugs in the system libraries, memory errors, etc).
Therefore, if the data you are going to compress are important, give the Therefore, if the data you are going to compress are important, give the
'--keep' option to plzip and don't remove the original file until you option '--keep' to plzip and don't remove the original file until you
verify the compressed file with a command like verify the compressed file with a command like
'plzip -cd file.lz | cmp file -'. Most RAM errors happening during 'plzip -cd file.lz | cmp file -'. Most RAM errors happening during
compression can only be detected by comparing the compressed file with compression can only be detected by comparing the compressed file with the
the original because the corruption happens before plzip compresses the original because the corruption happens before plzip compresses the RAM
RAM contents, resulting in a valid compressed file containing wrong contents, resulting in a valid compressed file containing wrong data.
data.
Example 1: Replace a regular file with its compressed version 'file.lz' Example 1: Extract all the files from archive 'foo.tar.lz'.
and show the compression ratio.
tar -xf foo.tar.lz
or
plzip -cd foo.tar.lz | tar -xf -
Example 2: Replace a regular file with its compressed version 'file.lz' and
show the compression ratio.
plzip -v file plzip -v file
Example 2: Like example 1 but the created 'file.lz' has a block size of Example 3: Like example 1 but the created 'file.lz' has a block size of
1 MiB. The compression ratio is not shown. 1 MiB. The compression ratio is not shown.
plzip -B 1MiB file plzip -B 1MiB file
Example 3: Restore a regular file from its compressed version Example 4: Restore a regular file from its compressed version 'file.lz'. If
'file.lz'. If the operation is successful, 'file.lz' is removed. the operation is successful, 'file.lz' is removed.
plzip -d file.lz plzip -d file.lz
Example 4: Verify the integrity of the compressed file 'file.lz' and Example 5: Verify the integrity of the compressed file 'file.lz' and show
show status. status.
plzip -tv file.lz plzip -tv file.lz
Example 5: Compress a whole device in /dev/sdc and send the output to Example 6: Compress a whole device in /dev/sdc and send the output to
'file.lz'. 'file.lz'.
plzip -c /dev/sdc > file.lz plzip -c /dev/sdc > file.lz
or
plzip /dev/sdc -o file.lz
Example 6: The right way of concatenating the decompressed output of two Example 7: The right way of concatenating the decompressed output of two or
or more compressed files. *Note Trailing data::. more compressed files. *Note Trailing data::.
Don't do this Don't do this
cat file1.lz file2.lz file3.lz | plzip -d cat file1.lz file2.lz file3.lz | plzip -d -
Do this instead Do this instead
plzip -cd file1.lz file2.lz file3.lz plzip -cd file1.lz file2.lz file3.lz
Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed Example 8: Decompress 'file.lz' partially until 10 KiB of decompressed data
data are produced. are produced.
plzip -cd file.lz | dd bs=1024 count=10 plzip -cd file.lz | dd bs=1024 count=10
Example 8: Decompress 'file.lz' partially from decompressed byte 10000 Example 9: Decompress 'file.lz' partially from decompressed byte at offset
to decompressed byte 15000 (5000 bytes are produced). 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
plzip -cd file.lz | dd bs=1000 skip=10 count=5 plzip -cd file.lz | dd bs=1000 skip=10 count=5
@ -725,14 +751,14 @@ File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: T
10 Reporting bugs 10 Reporting bugs
***************** *****************
There are probably bugs in plzip. There are certainly errors and There are probably bugs in plzip. There are certainly errors and omissions
omissions in this manual. If you report them, they will get fixed. If in this manual. If you report them, they will get fixed. If you don't, no
you don't, no one will ever know about them and they will remain unfixed one will ever know about them and they will remain unfixed for all
for all eternity, if not longer. eternity, if not longer.
If you find a bug in plzip, please send electronic mail to If you find a bug in plzip, please send electronic mail to
<lzip-bug@nongnu.org>. Include the version number, which you can find <lzip-bug@nongnu.org>. Include the version number, which you can find by
by running 'plzip --version'. running 'plzip --version'.
 
File: plzip.info, Node: Concept index, Prev: Problems, Up: Top File: plzip.info, Node: Concept index, Prev: Problems, Up: Top
@ -762,21 +788,21 @@ Concept index
 
Tag Table: Tag Table:
Node: Top222 Node: Top222
Node: Introduction1158 Node: Introduction1159
Node: Output5456 Node: Output5788
Node: Invoking plzip6936 Node: Invoking plzip7351
Ref: --trailing-error7563 Ref: --trailing-error8146
Ref: --data-size7806 Ref: --data-size8384
Node: Program design16267 Node: Program design18364
Node: File format18419 Node: File format20542
Ref: coded-dict-size19719 Ref: coded-dict-size21840
Node: Memory requirements20849 Node: Memory requirements22995
Node: Minimum file sizes22531 Node: Minimum file sizes24677
Node: Trailing data24540 Node: Trailing data26693
Node: Examples26823 Node: Examples28961
Ref: concat-example28238 Ref: concat-example30556
Node: Problems28813 Node: Problems31153
Node: Concept index29341 Node: Concept index31681
 
End Tag Table End Tag Table

View file

@ -6,8 +6,8 @@
@finalout @finalout
@c %**end of header @c %**end of header
@set UPDATED 5 January 2019 @set UPDATED 3 January 2021
@set VERSION 1.8 @set VERSION 1.9
@dircategory Data Compression @dircategory Data Compression
@direntry @direntry
@ -29,6 +29,7 @@
@contents @contents
@end ifnothtml @end ifnothtml
@ifnottex
@node Top @node Top
@top @top
@ -49,35 +50,47 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
@end menu @end menu
@sp 1 @sp 1
Copyright @copyright{} 2009-2019 Antonio Diaz Diaz. Copyright @copyright{} 2009-2021 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission This manual is free documentation: you have unlimited permission to copy,
to copy, distribute and modify it. distribute, and modify it.
@end ifnottex
@node Introduction @node Introduction
@chapter Introduction @chapter Introduction
@cindex introduction @cindex introduction
@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip} is a massively parallel @uref{http://www.nongnu.org/lzip/plzip.html,,Plzip}
(multi-threaded) implementation of lzip, fully compatible with lzip 1.4 or is a massively parallel (multi-threaded) implementation of lzip, fully
newer. Plzip uses the lzlib compression library. compatible with lzip 1.4 or newer. Plzip uses the compression library
@uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} is a lossless data @uref{http://www.nongnu.org/lzip/lzip.html,,Lzip}
compressor with a user interface similar to the one of gzip or bzip2. Lzip is a lossless data compressor with a user interface similar to the one
can compress about as fast as gzip @w{(lzip -0)} or compress most files more of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between gzip chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
and bzip2. Lzip is better than gzip and bzip2 from a data recovery interoperability. Lzip can compress about as fast as gzip @w{(lzip -0)} or
perspective. Lzip has been designed, written and tested with great care to compress most files more than bzip2 @w{(lzip -9)}. Decompression speed is
replace gzip and bzip2 as the standard general-purpose compressed format for intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
unix-like systems. a data recovery perspective. Lzip has been designed, written, and tested
with great care to replace gzip and bzip2 as the standard general-purpose
compressed format for unix-like systems.
Plzip can compress/decompress large files on multiprocessor machines Plzip can compress/decompress large files on multiprocessor machines much
much faster than lzip, at the cost of a slightly reduced compression faster than lzip, at the cost of a slightly reduced compression ratio (0.4
ratio (0.4 to 2 percent larger compressed files). Note that the number to 2 percent larger compressed files). Note that the number of usable
of usable threads is limited by file size; on files larger than a few GB threads is limited by file size; on files larger than a few GB plzip can use
plzip can use hundreds of processors, but on files of only a few MB hundreds of processors, but on files of only a few MB plzip is no faster
plzip is no faster than lzip. @xref{Minimum file sizes}. than lzip. @xref{Minimum file sizes}.
For creation and manipulation of compressed tar archives
@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be
more efficient than using tar and plzip because tarlz is able to keep the
alignment between tar members and lzip members.
@ifnothtml
@xref{Top,tarlz manual,,tarlz}.
@end ifnothtml
The lzip file format is designed for data sharing and long-term archiving, The lzip file format is designed for data sharing and long-term archiving,
taking into account both data integrity and decoder availability: taking into account both data integrity and decoder availability:
@ -85,11 +98,11 @@ taking into account both data integrity and decoder availability:
@itemize @bullet @itemize @bullet
@item @item
The lzip format provides very safe integrity checking and some data The lzip format provides very safe integrity checking and some data
recovery means. The recovery means. The program
@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} @uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
program can repair bit flip errors (one of the most common forms of data can repair bit flip errors (one of the most common forms of data corruption)
corruption) in lzip files, and provides data recovery capabilities, in lzip files, and provides data recovery capabilities, including
including error-checked merging of damaged copies of a file. error-checked merging of damaged copies of a file.
@ifnothtml @ifnothtml
@xref{Data safety,,,lziprecover}. @xref{Data safety,,,lziprecover}.
@end ifnothtml @end ifnothtml
@ -107,10 +120,10 @@ Additionally the lzip reference implementation is copylefted, which
guarantees that it will remain free forever. guarantees that it will remain free forever.
@end itemize @end itemize
A nice feature of the lzip format is that a corrupt byte is easier to A nice feature of the lzip format is that a corrupt byte is easier to repair
repair the nearer it is from the beginning of the file. Therefore, with the nearer it is from the beginning of the file. Therefore, with the help of
the help of lziprecover, losing an entire archive just because of a lziprecover, losing an entire archive just because of a corrupt byte near
corrupt byte near the beginning is a thing of the past. the beginning is a thing of the past.
Plzip uses the same well-defined exit status values used by lzip, which Plzip uses the same well-defined exit status values used by lzip, which
makes it safer than compressors returning ambiguous warning values (like makes it safer than compressors returning ambiguous warning values (like
@ -138,13 +151,12 @@ possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared). S_ISGID are cleared).
Plzip is able to read from some types of non regular files if the Plzip is able to read from some types of non-regular files if either the
@samp{--stdout} option is specified. option @samp{-c} or the option @samp{-o} is specified.
If no file names are specified, plzip compresses (or decompresses) from Plzip will refuse to read compressed data from a terminal or write compressed
standard input to standard output. In this case, plzip will decline to data to a terminal, as this would be entirely incomprehensible and might
write compressed output to a terminal, as this would be entirely leave the terminal in an abnormal state.
incomprehensible and therefore pointless.
Plzip will correctly decompress a file which is the concatenation of two or Plzip will correctly decompress a file which is the concatenation of two or
more compressed files. The result is the concatenation of the corresponding more compressed files. The result is the concatenation of the corresponding
@ -162,16 +174,16 @@ The output of plzip looks like this:
plzip -v foo plzip -v foo
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
plzip -tvv foo.lz plzip -tvvv foo.lz
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok
@end example @end example
The meaning of each field is as follows: The meaning of each field is as follows:
@table @code @table @code
@item N:1 @item N:1
The compression ratio @w{(uncompressed_size / compressed_size)}, shown The compression ratio @w{(uncompressed_size / compressed_size)}, shown as
as N to 1. @w{N to 1}.
@item ratio @item ratio
The inverse compression ratio @w{(compressed_size / uncompressed_size)}, The inverse compression ratio @w{(compressed_size / uncompressed_size)},
@ -182,23 +194,23 @@ decimal point two places to the left; @w{14.98% = 0.1498}.
The space saved by compression @w{(1 - ratio)}, shown as a percentage. The space saved by compression @w{(1 - ratio)}, shown as a percentage.
@item in @item in
The size of the uncompressed data. When decompressing or testing, it is Size of the input data. This is the uncompressed size when compressing, or
shown as @code{decompressed}. Note that plzip always prints the the compressed size when decompressing or testing. Note that plzip always
uncompressed size before the compressed size when compressing, prints the uncompressed size before the compressed size when compressing,
decompressing, testing or listing. decompressing, testing, or listing.
@item out @item out
The size of the compressed data. When decompressing or testing, it is Size of the output data. This is the compressed size when compressing, or
shown as @code{compressed}. the decompressed size when decompressing or testing.
@end table @end table
When decompressing or testing at verbosity level 4 (-vvvv), the When decompressing or testing at verbosity level 4 (-vvvv), the dictionary
dictionary size used to compress the file is also shown. size used to compress the file is also shown.
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
have been compressed. Decompressed is used to refer to data which have been compressed. Decompressed is used to refer to data which have undergone
undergone the process of decompression. the process of decompression.
@node Invoking plzip @node Invoking plzip
@ -215,11 +227,16 @@ plzip [@var{options}] [@var{files}]
@end example @end example
@noindent @noindent
@samp{-} used as a @var{file} argument means standard input. It can be If no file names are specified, plzip compresses (or decompresses) from
mixed with other @var{files} and is read just once, the first time it standard input to standard output. A hyphen @samp{-} used as a @var{file}
appears in the command line. argument means standard input. It can be mixed with other @var{files} and is
read just once, the first time it appears in the command line.
plzip supports the following options: plzip supports the following
@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
@ifnothtml
@xref{Argument syntax,,,arg_parser}.
@end ifnothtml
@table @code @table @code
@item -h @item -h
@ -246,18 +263,20 @@ input file will be divided in chunks of this size before compression is
performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value
is two times the dictionary size, except for option @samp{-0} where it is two times the dictionary size, except for option @samp{-0} where it
defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is
larger than the chosen data size. larger than the data size specified. @xref{Minimum file sizes}.
@item -c @item -c
@itemx --stdout @itemx --stdout
Compress or decompress to standard output; keep input files unchanged. Compress or decompress to standard output; keep input files unchanged. If
If compressing several files, each file is compressed independently. compressing several files, each file is compressed independently. This
This option is needed when reading from a named pipe (fifo) or from a option (or @samp{-o}) is needed when reading from a named pipe (fifo) or
device. from a device. Use @w{@samp{lziprecover -cd -i}} to recover as much of the
decompressed data as possible when decompressing a corrupt file. @samp{-c}
overrides @samp{-o}. @samp{-c} has no effect when testing or listing.
@item -d @item -d
@itemx --decompress @itemx --decompress
Decompress the specified files. If a file does not exist or can't be Decompress the files specified. If a file does not exist or can't be
opened, plzip continues decompressing the rest of the files. If a file opened, plzip continues decompressing the rest of the files. If a file
fails to decompress, or is a terminal, plzip exits immediately without fails to decompress, or is a terminal, plzip exits immediately without
decompressing the rest of the files. decompressing the rest of the files.
@ -277,17 +296,18 @@ Keep (don't delete) input files during compression or decompression.
@item -l @item -l
@itemx --list @itemx --list
Print the uncompressed size, compressed size and percentage saved of the Print the uncompressed size, compressed size, and percentage saved of the
specified files. Trailing data are ignored. The values produced are files specified. Trailing data are ignored. The values produced are correct
correct even for multimember files. If more than one file is given, a even for multimember files. If more than one file is given, a final line
final line containing the cumulative sizes is printed. With @samp{-v}, containing the cumulative sizes is printed. With @samp{-v}, the dictionary
the dictionary size, the number of members in the file, and the amount size, the number of members in the file, and the amount of trailing data (if
of trailing data (if any) are also printed. With @samp{-vv}, the any) are also printed. With @samp{-vv}, the positions and sizes of each
positions and sizes of each member in multimember files are also member in multimember files are also printed.
printed. @samp{-lq} can be used to verify quickly (without
decompressing) the structural integrity of the specified files. (Use @samp{-lq} can be used to verify quickly (without decompressing) the
@samp{--test} to verify the data integrity). @samp{-alq} additionally structural integrity of the files specified. (Use @samp{--test} to verify
verifies that none of the specified files contain trailing data. the data integrity). @samp{-alq} additionally verifies that none of the
files specified contain trailing data.
@item -m @var{bytes} @item -m @var{bytes}
@itemx --match-length=@var{bytes} @itemx --match-length=@var{bytes}
@ -298,27 +318,36 @@ compression times.
@item -n @var{n} @item -n @var{n}
@itemx --threads=@var{n} @itemx --threads=@var{n}
Set the number of worker threads, overriding the system's default. Valid Set the maximum number of worker threads, overriding the system's default.
values range from 1 to "as many as your system can support". If this Valid values range from 1 to "as many as your system can support". If this
option is not used, plzip tries to detect the number of processors in option is not used, plzip tries to detect the number of processors in the
the system and use it as default value. When compressing on a @w{32 bit} system and use it as default value. When compressing on a @w{32 bit} system,
system, plzip tries to limit the memory use to under @w{2.22 GiB} (4 plzip tries to limit the memory use to under @w{2.22 GiB} (4 worker threads
worker threads at level -9) by reducing the number of threads below the at level -9) by reducing the number of threads below the system's default.
system's default. @w{@samp{plzip --help}} shows the system's default @w{@samp{plzip --help}} shows the system's default value.
value.
Note that the number of usable threads is limited to @w{ceil( file_size Plzip starts the number of threads required by each file without exceeding
/ data_size )} during compression (@pxref{Minimum file sizes}), and to the value specified. Note that the number of usable threads is limited to
the number of members in the input during decompression. @w{ceil( file_size / data_size )} during compression (@pxref{Minimum file
sizes}), and to the number of members in the input during decompression. You
can find the number of members in a lzip file by running
@w{@samp{plzip -lv file.lz}}.
@item -o @var{file} @item -o @var{file}
@itemx --output=@var{file} @itemx --output=@var{file}
When reading from standard input and @samp{--stdout} has not been If @samp{-c} has not been also specified, write the (de)compressed output to
specified, use @samp{@var{file}} as the virtual name of the uncompressed @var{file}; keep input files unchanged. If compressing several files, each
file. This produces a file named @samp{@var{file}} when decompressing, file is compressed independently. This option (or @samp{-c}) is needed when
or a file named @samp{@var{file}.lz} when compressing. A second reading from a named pipe (fifo) or from a device. @w{@samp{-o -}} is
@samp{.lz} extension is not added if @samp{@var{file}} already ends in equivalent to @samp{-c}. @samp{-o} has no effect when testing or listing.
@samp{.lz} or @samp{.tlz}.
In order to keep backward compatibility with plzip versions prior to 1.9,
when compressing from standard input and no other file names are given, the
extension @samp{.lz} is appended to @var{file} unless it already ends in
@samp{.lz} or @samp{.tlz}. This feature will be removed in a future version
of plzip. Meanwhile, redirection may be used instead of @samp{-o} to write
the compressed output to a file without the extension @samp{.lz} in its
name: @w{@samp{plzip < file > foo}}.
@item -q @item -q
@itemx --quiet @itemx --quiet
@ -331,7 +360,7 @@ for each file the largest dictionary size that does not exceed neither
the file size nor this limit. Valid values range from @w{4 KiB} to the file size nor this limit. Valid values range from @w{4 KiB} to
@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning @w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be 2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
coded in just one byte (@pxref{coded-dict-size}). If the specified size coded in just one byte (@pxref{coded-dict-size}). If the size specified
does not match one of the valid sizes, it will be rounded upwards by does not match one of the valid sizes, it will be rounded upwards by
adding up to @w{(@var{bytes} / 8)} to it. adding up to @w{(@var{bytes} / 8)} to it.
@ -341,12 +370,13 @@ is affected at compression time by the choice of dictionary size limit.
@item -t @item -t
@itemx --test @itemx --test
Check integrity of the specified files, but don't decompress them. This Check integrity of the files specified, but don't decompress them. This
really performs a trial decompression and throws away the result. Use it really performs a trial decompression and throws away the result. Use it
together with @samp{-v} to see information about the files. If a file together with @samp{-v} to see information about the files. If a file
does not exist, can't be opened, or is a terminal, plzip continues fails the test, does not exist, can't be opened, or is a terminal, plzip
checking the rest of the files. If a file fails the test, plzip may be continues checking the rest of the files. A final diagnostic is shown at
unable to check the rest of the files. verbosity level 1 or higher if any file fails the test when testing
multiple files.
@item -v @item -v
@itemx --verbose @itemx --verbose
@ -364,12 +394,12 @@ Compression level. Set the compression parameters (dictionary size and
match length limit) as shown in the table below. The default compression match length limit) as shown in the table below. The default compression
level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that
@samp{-9} can be much slower than @samp{-0}. These options have no @samp{-9} can be much slower than @samp{-0}. These options have no
effect when decompressing, testing or listing. effect when decompressing, testing, or listing.
The bidimensional parameter space of LZMA can't be mapped to a linear The bidimensional parameter space of LZMA can't be mapped to a linear
scale optimal for all files. If your files are large, very repetitive, scale optimal for all files. If your files are large, very repetitive,
etc, you may need to use the @samp{--dictionary-size} and etc, you may need to use the options @samp{--dictionary-size} and
@samp{--match-length} options directly to achieve optimal performance. @samp{--match-length} directly to achieve optimal performance.
If several compression levels or @samp{-s} or @samp{-m} options are If several compression levels or @samp{-s} or @samp{-m} options are
given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is
@ -394,7 +424,7 @@ equivalent to @w{@samp{-s64MiB -m273}}
Aliases for GNU gzip compatibility. Aliases for GNU gzip compatibility.
@item --loose-trailing @item --loose-trailing
When decompressing, testing or listing, allow trailing data whose first When decompressing, testing, or listing, allow trailing data whose first
bytes are so similar to the magic bytes of a lzip header that they can bytes are so similar to the magic bytes of a lzip header that they can
be confused with a corrupt header. Use this option if a file triggers a be confused with a corrupt header. Use this option if a file triggers a
"corrupt header" error and the cause is not indeed a corrupt header. "corrupt header" error and the cause is not indeed a corrupt header.
@ -411,6 +441,19 @@ decompressing to non-seekable output. Increasing the number of packets
may increase decompression speed, but requires more memory. Valid values may increase decompression speed, but requires more memory. Valid values
range from 1 to 1024. The default value is 64. range from 1 to 1024. The default value is 64.
@item --check-lib
Compare the
@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib}
used to compile plzip with the version actually being used at run time and
exit. Report any differences found. Exit with error status 1 if differences
are found. A mismatch may indicate that lzlib is not correctly installed or
that a different version of lzlib has been installed after compiling plzip.
@w{@samp{plzip -v --check-lib}} shows the version of lzlib being used and
the value of @samp{LZ_API_VERSION} (if defined).
@ifnothtml
@xref{Library version,,,lzlib}.
@end ifnothtml
@end table @end table
Numbers given as arguments to options may be followed by a multiplier Numbers given as arguments to options may be followed by a multiplier
@ -438,16 +481,16 @@ caused plzip to panic.
@node Program design @node Program design
@chapter Program design @chapter Internal structure of plzip
@cindex program design @cindex program design
When compressing, plzip divides the input file into chunks and When compressing, plzip divides the input file into chunks and compresses as
compresses as many chunks simultaneously as worker threads are chosen, many chunks simultaneously as worker threads are chosen, creating a
creating a multimember compressed file. multimember compressed file.
When decompressing, plzip decompresses as many members simultaneously as When decompressing, plzip decompresses as many members simultaneously as
worker threads are chosen. Files that were compressed with lzip will not worker threads are chosen. Files that were compressed with lzip will not
be decompressed faster than using lzip (unless the @samp{-b} option was used) be decompressed faster than using lzip (unless the option @samp{-b} was used)
because lzip usually produces single-member files, which can't be because lzip usually produces single-member files, which can't be
decompressed in parallel. decompressed in parallel.
@ -492,6 +535,7 @@ when there is no longer anything to take away.@*
@sp 1 @sp 1
In the diagram below, a box like this: In the diagram below, a box like this:
@verbatim @verbatim
+---+ +---+
| | <-- the vertical bars might be missing | | <-- the vertical bars might be missing
@ -499,6 +543,7 @@ In the diagram below, a box like this:
@end verbatim @end verbatim
represents one byte; a box like this: represents one byte; a box like this:
@verbatim @verbatim
+==============+ +==============+
| | | |
@ -513,6 +558,7 @@ The members simply appear one after another in the file, with no
additional information before, between, or after them. additional information before, between, or after them.
Each member has the following structure: Each member has the following structure:
@verbatim @verbatim
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | | ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
@ -532,8 +578,7 @@ Just in case something needs to be modified in the future. 1 for now.
@anchor{coded-dict-size} @anchor{coded-dict-size}
@item DS (coded dictionary size, 1 byte) @item DS (coded dictionary size, 1 byte)
The dictionary size is calculated by taking a power of 2 (the base size) The dictionary size is calculated by taking a power of 2 (the base size)
and subtracting from it a fraction between 0/16 and 7/16 of the base and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
size.@*
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
from the base size to obtain the dictionary size.@* from the base size to obtain the dictionary size.@*
@ -541,8 +586,8 @@ Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
Valid values for dictionary size range from 4 KiB to 512 MiB. Valid values for dictionary size range from 4 KiB to 512 MiB.
@item LZMA stream @item LZMA stream
The LZMA stream, finished by an end of stream marker. Uses default The LZMA stream, finished by an end of stream marker. Uses default values
values for encoder properties. for encoder properties.
@ifnothtml @ifnothtml
@xref{Stream format,,,lzip}, @xref{Stream format,,,lzip},
@end ifnothtml @end ifnothtml
@ -553,7 +598,7 @@ See
for a complete description. for a complete description.
@item CRC32 (4 bytes) @item CRC32 (4 bytes)
CRC of the uncompressed original data. Cyclic Redundancy Check (CRC) of the uncompressed original data.
@item Data size (8 bytes) @item Data size (8 bytes)
Size of the uncompressed original data. Size of the uncompressed original data.
@ -570,8 +615,8 @@ facilitates safe recovery of undamaged members from multimember files.
@chapter Memory required to compress and decompress @chapter Memory required to compress and decompress
@cindex memory requirements @cindex memory requirements
The amount of memory required @strong{per worker thread} for The amount of memory required @strong{per worker thread} for decompression
decompression or testing is approximately the following: or testing is approximately the following:
@itemize @bullet @itemize @bullet
@item @item
@ -610,8 +655,7 @@ times the data size. Default is @w{142 MiB}.
@noindent @noindent
The following table shows the memory required @strong{per thread} for The following table shows the memory required @strong{per thread} for
compression at a given level, using the default data size for each compression at a given level, using the default data size for each level:
level:
@multitable {Level} {Memory required} @multitable {Level} {Memory required}
@item Level @tab Memory required @item Level @tab Memory required
@ -643,7 +687,7 @@ least as large as the number of worker threads times the chunk size
compress, and compression will be proportionally slower. The maximum compress, and compression will be proportionally slower. The maximum
speed increase achievable on a given file is limited by the ratio speed increase achievable on a given file is limited by the ratio
@w{(file_size / data_size)}. For example, a tarball the size of gcc or @w{(file_size / data_size)}. For example, a tarball the size of gcc or
linux will scale up to 8 processors at level -9. linux will scale up to 10 or 14 processors at level -9.
The following table shows the minimum uncompressed file size needed for The following table shows the minimum uncompressed file size needed for
full use of N processors at a given compression level, using the default full use of N processors at a given compression level, using the default
@ -723,7 +767,7 @@ where a file containing trailing data must be rejected, the option
WARNING! Even if plzip is bug-free, other causes may result in a corrupt WARNING! Even if plzip is bug-free, other causes may result in a corrupt
compressed file (bugs in the system libraries, memory errors, etc). compressed file (bugs in the system libraries, memory errors, etc).
Therefore, if the data you are going to compress are important, give the Therefore, if the data you are going to compress are important, give the
@samp{--keep} option to plzip and don't remove the original file until you option @samp{--keep} to plzip and don't remove the original file until you
verify the compressed file with a command like verify the compressed file with a command like
@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during @w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during
compression can only be detected by comparing the compressed file with the compression can only be detected by comparing the compressed file with the
@ -732,8 +776,18 @@ contents, resulting in a valid compressed file containing wrong data.
@sp 1 @sp 1
@noindent @noindent
Example 1: Replace a regular file with its compressed version Example 1: Extract all the files from archive @samp{foo.tar.lz}.
@samp{file.lz} and show the compression ratio.
@example
tar -xf foo.tar.lz
or
plzip -cd foo.tar.lz | tar -xf -
@end example
@sp 1
@noindent
Example 2: Replace a regular file with its compressed version @samp{file.lz}
and show the compression ratio.
@example @example
plzip -v file plzip -v file
@ -741,8 +795,8 @@ plzip -v file
@sp 1 @sp 1
@noindent @noindent
Example 2: Like example 1 but the created @samp{file.lz} has a block Example 3: Like example 1 but the created @samp{file.lz} has a block size of
size of @w{1 MiB}. The compression ratio is not shown. @w{1 MiB}. The compression ratio is not shown.
@example @example
plzip -B 1MiB file plzip -B 1MiB file
@ -750,9 +804,8 @@ plzip -B 1MiB file
@sp 1 @sp 1
@noindent @noindent
Example 3: Restore a regular file from its compressed version Example 4: Restore a regular file from its compressed version
@samp{file.lz}. If the operation is successful, @samp{file.lz} is @samp{file.lz}. If the operation is successful, @samp{file.lz} is removed.
removed.
@example @example
plzip -d file.lz plzip -d file.lz
@ -760,8 +813,8 @@ plzip -d file.lz
@sp 1 @sp 1
@noindent @noindent
Example 4: Verify the integrity of the compressed file @samp{file.lz} Example 5: Verify the integrity of the compressed file @samp{file.lz} and
and show status. show status.
@example @example
plzip -tv file.lz plzip -tv file.lz
@ -769,29 +822,31 @@ plzip -tv file.lz
@sp 1 @sp 1
@noindent @noindent
Example 5: Compress a whole device in /dev/sdc and send the output to Example 6: Compress a whole device in /dev/sdc and send the output to
@samp{file.lz}. @samp{file.lz}.
@example @example
plzip -c /dev/sdc > file.lz plzip -c /dev/sdc > file.lz
or
plzip /dev/sdc -o file.lz
@end example @end example
@sp 1 @sp 1
@anchor{concat-example} @anchor{concat-example}
@noindent @noindent
Example 6: The right way of concatenating the decompressed output of two Example 7: The right way of concatenating the decompressed output of two or
or more compressed files. @xref{Trailing data}. more compressed files. @xref{Trailing data}.
@example @example
Don't do this Don't do this
cat file1.lz file2.lz file3.lz | plzip -d cat file1.lz file2.lz file3.lz | plzip -d -
Do this instead Do this instead
plzip -cd file1.lz file2.lz file3.lz plzip -cd file1.lz file2.lz file3.lz
@end example @end example
@sp 1 @sp 1
@noindent @noindent
Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of Example 8: Decompress @samp{file.lz} partially until @w{10 KiB} of
decompressed data are produced. decompressed data are produced.
@example @example
@ -800,8 +855,8 @@ plzip -cd file.lz | dd bs=1024 count=10
@sp 1 @sp 1
@noindent @noindent
Example 8: Decompress @samp{file.lz} partially from decompressed byte Example 9: Decompress @samp{file.lz} partially from decompressed byte at
10000 to decompressed byte 15000 (5000 bytes are produced). offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
@example @example
plzip -cd file.lz | dd bs=1000 skip=10 count=5 plzip -cd file.lz | dd bs=1000 skip=10 count=5
@ -820,7 +875,7 @@ for all eternity, if not longer.
If you find a bug in plzip, please send electronic mail to If you find a bug in plzip, please send electronic mail to
@email{lzip-bug@@nongnu.org}. Include the version number, which you can @email{lzip-bug@@nongnu.org}. Include the version number, which you can
find by running @w{@code{plzip --version}}. find by running @w{@samp{plzip --version}}.
@node Concept index @node Concept index

32
list.cc
View file

@ -1,5 +1,5 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -21,7 +21,6 @@
#include <cstring> #include <cstring>
#include <string> #include <string>
#include <vector> #include <vector>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
#include <sys/stat.h> #include <sys/stat.h>
@ -37,11 +36,11 @@ void list_line( const unsigned long long uncomp_size,
const char * const input_filename ) const char * const input_filename )
{ {
if( uncomp_size > 0 ) if( uncomp_size > 0 )
std::printf( "%15llu %15llu %6.2f%% %s\n", uncomp_size, comp_size, std::printf( "%14llu %14llu %6.2f%% %s\n", uncomp_size, comp_size,
100.0 - ( ( 100.0 * comp_size ) / uncomp_size ), 100.0 - ( ( 100.0 * comp_size ) / uncomp_size ),
input_filename ); input_filename );
else else
std::printf( "%15llu %15llu -INF%% %s\n", uncomp_size, comp_size, std::printf( "%14llu %14llu -INF%% %s\n", uncomp_size, comp_size,
input_filename ); input_filename );
} }
@ -63,15 +62,15 @@ int list_files( const std::vector< std::string > & filenames,
from_stdin ? "(stdin)" : filenames[i].c_str(); from_stdin ? "(stdin)" : filenames[i].c_str();
struct stat in_stats; // not used struct stat in_stats; // not used
const int infd = from_stdin ? STDIN_FILENO : const int infd = from_stdin ? STDIN_FILENO :
open_instream( input_filename, &in_stats, true, true ); open_instream( input_filename, &in_stats, false, true );
if( infd < 0 ) { if( retval < 1 ) retval = 1; continue; } if( infd < 0 ) { set_retval( retval, 1 ); continue; }
const Lzip_index lzip_index( infd, ignore_trailing, loose_trailing ); const Lzip_index lzip_index( infd, ignore_trailing, loose_trailing );
close( infd ); close( infd );
if( lzip_index.retval() != 0 ) if( lzip_index.retval() != 0 )
{ {
show_file_error( input_filename, lzip_index.error().c_str() ); show_file_error( input_filename, lzip_index.error().c_str() );
if( retval < lzip_index.retval() ) retval = lzip_index.retval(); set_retval( retval, lzip_index.retval() );
continue; continue;
} }
if( verbosity >= 0 ) if( verbosity >= 0 )
@ -79,6 +78,7 @@ int list_files( const std::vector< std::string > & filenames,
const unsigned long long udata_size = lzip_index.udata_size(); const unsigned long long udata_size = lzip_index.udata_size();
const unsigned long long cdata_size = lzip_index.cdata_size(); const unsigned long long cdata_size = lzip_index.cdata_size();
total_comp += cdata_size; total_uncomp += udata_size; ++files; total_comp += cdata_size; total_uncomp += udata_size; ++files;
const long members = lzip_index.members();
if( first_post ) if( first_post )
{ {
first_post = false; first_post = false;
@ -86,25 +86,19 @@ int list_files( const std::vector< std::string > & filenames,
std::fputs( " uncompressed compressed saved name\n", stdout ); std::fputs( " uncompressed compressed saved name\n", stdout );
} }
if( verbosity >= 1 ) if( verbosity >= 1 )
{ std::printf( "%s %5ld %6lld ",
unsigned dictionary_size = 0; format_ds( lzip_index.dictionary_size() ), members,
for( long i = 0; i < lzip_index.members(); ++i ) lzip_index.file_size() - cdata_size );
dictionary_size =
std::max( dictionary_size, lzip_index.dictionary_size( i ) );
const long long trailing_size = lzip_index.file_size() - cdata_size;
std::printf( "%s %5ld %6lld ", format_ds( dictionary_size ),
lzip_index.members(), trailing_size );
}
list_line( udata_size, cdata_size, input_filename ); list_line( udata_size, cdata_size, input_filename );
if( verbosity >= 2 && lzip_index.members() > 1 ) if( verbosity >= 2 && members > 1 )
{ {
std::fputs( " member data_pos data_size member_pos member_size\n", stdout ); std::fputs( " member data_pos data_size member_pos member_size\n", stdout );
for( long i = 0; i < lzip_index.members(); ++i ) for( long i = 0; i < members; ++i )
{ {
const Block & db = lzip_index.dblock( i ); const Block & db = lzip_index.dblock( i );
const Block & mb = lzip_index.mblock( i ); const Block & mb = lzip_index.mblock( i );
std::printf( "%5ld %15llu %15llu %15llu %15llu\n", std::printf( "%6ld %14llu %14llu %14llu %14llu\n",
i + 1, db.pos(), db.size(), mb.pos(), mb.size() ); i + 1, db.pos(), db.size(), mb.pos(), mb.size() );
} }
first_post = true; // reprint heading after list of members first_post = true; // reprint heading after list of members

56
lzip.h
View file

@ -1,5 +1,5 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -15,13 +15,11 @@
along with this program. If not, see <http://www.gnu.org/licenses/>. along with this program. If not, see <http://www.gnu.org/licenses/>.
*/ */
#ifndef LZ_API_VERSION #include <pthread.h>
#define LZ_API_VERSION 1
#endif
enum { enum {
min_dictionary_bits = 12, min_dictionary_bits = 12,
min_dictionary_size = 1 << min_dictionary_bits, min_dictionary_size = 1 << min_dictionary_bits, // >= modeled_distances
max_dictionary_bits = 29, max_dictionary_bits = 29,
max_dictionary_size = 1 << max_dictionary_bits, max_dictionary_size = 1 << max_dictionary_bits,
min_member_size = 36 }; min_member_size = 36 };
@ -88,7 +86,7 @@ struct Lzip_header
{ {
uint8_t data[6]; // 0-3 magic bytes uint8_t data[6]; // 0-3 magic bytes
// 4 version // 4 version
// 5 coded_dict_size // 5 coded dictionary size
enum { size = 6 }; enum { size = 6 };
void set_magic() { std::memcpy( data, lzip_magic, 4 ); data[4] = 1; } void set_magic() { std::memcpy( data, lzip_magic, 4 ); data[4] = 1; }
@ -134,6 +132,10 @@ struct Lzip_header
} }
return true; return true;
} }
bool verify() const
{ return verify_magic() && verify_version() &&
isvalid_ds( dictionary_size() ); }
}; };
@ -190,10 +192,14 @@ struct Lzip_trailer
}; };
inline void set_retval( int & retval, const int new_val )
{ if( retval < new_val ) retval = new_val; }
const char * const bad_magic_msg = "Bad magic number (file not in lzip format)."; const char * const bad_magic_msg = "Bad magic number (file not in lzip format).";
const char * const bad_dict_msg = "Invalid dictionary size in member header."; const char * const bad_dict_msg = "Invalid dictionary size in member header.";
const char * const corrupt_mm_msg = "Corrupt header in multimember file."; const char * const corrupt_mm_msg = "Corrupt header in multimember file.";
const char * const trailing_msg = "Trailing data not allowed."; const char * const trailing_msg = "Trailing data not allowed.";
const char * const mem_msg = "Not enough memory.";
// defined in compress.cc // defined in compress.cc
int readblock( const int fd, uint8_t * const buf, const int size ); int readblock( const int fd, uint8_t * const buf, const int size );
@ -231,13 +237,19 @@ int dec_stream( const unsigned long long cfile_size,
// defined in decompress.cc // defined in decompress.cc
int preadblock( const int fd, uint8_t * const buf, const int size, int preadblock( const int fd, uint8_t * const buf, const int size,
const long long pos ); const long long pos );
int decompress_read_error( struct LZ_Decoder * const decoder, class Shared_retval;
const Pretty_print & pp, const int worker_id ); void decompress_error( struct LZ_Decoder * const decoder,
const Pretty_print & pp,
Shared_retval & shared_retval, const int worker_id );
void show_results( const unsigned long long in_size,
const unsigned long long out_size,
const unsigned dictionary_size, const bool testing );
int decompress( const unsigned long long cfile_size, int num_workers, int decompress( const unsigned long long cfile_size, int num_workers,
const int infd, const int outfd, const Pretty_print & pp, const int infd, const int outfd, const Pretty_print & pp,
const int debug_level, const int in_slots, const int debug_level, const int in_slots,
const int out_slots, const bool ignore_trailing, const int out_slots, const bool ignore_trailing,
const bool loose_trailing, const bool infd_isreg ); const bool loose_trailing, const bool infd_isreg,
const bool one_to_one );
// defined in list.cc // defined in list.cc
int list_files( const std::vector< std::string > & filenames, int list_files( const std::vector< std::string > & filenames,
@ -249,7 +261,7 @@ const char * bad_version( const unsigned version );
const char * format_ds( const unsigned dictionary_size ); const char * format_ds( const unsigned dictionary_size );
void show_header( const unsigned dictionary_size ); void show_header( const unsigned dictionary_size );
int open_instream( const char * const name, struct stat * const in_statsp, int open_instream( const char * const name, struct stat * const in_statsp,
const bool no_ofile, const bool reg_only = false ); const bool one_to_one, const bool reg_only = false );
void cleanup_and_fail( const int retval = 1 ); // terminate the program void cleanup_and_fail( const int retval = 1 ); // terminate the program
void show_error( const char * const msg, const int errcode = 0, void show_error( const char * const msg, const int errcode = 0,
const bool help = false ); const bool help = false );
@ -295,3 +307,27 @@ public:
xunlock( &mutex ); xunlock( &mutex );
} }
}; };
class Shared_retval // shared return value protected by a mutex
{
int retval;
pthread_mutex_t mutex;
Shared_retval( const Shared_retval & ); // declared as private
void operator=( const Shared_retval & ); // declared as private
public:
Shared_retval() : retval( 0 ) { xinit_mutex( &mutex ); }
bool set_value( const int val ) // only one thread can set retval > 0
{ // (and print an error message)
xlock( &mutex );
const bool done = ( retval == 0 && val > 0 );
if( done ) retval = val;
xunlock( &mutex );
return done;
}
int operator()() const { return retval; }
};

View file

@ -1,5 +1,5 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -23,7 +23,6 @@
#include <cstring> #include <cstring>
#include <string> #include <string>
#include <vector> #include <vector>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
@ -44,6 +43,19 @@ int seek_read( const int fd, uint8_t * const buf, const int size,
} // end namespace } // end namespace
bool Lzip_index::check_header_error( const Lzip_header & header,
const bool first )
{
if( !header.verify_magic() )
{ error_ = bad_magic_msg; retval_ = 2; if( first ) bad_magic_ = true;
return true; }
if( !header.verify_version() )
{ error_ = bad_version( header.version() ); retval_ = 2; return true; }
if( !isvalid_ds( header.dictionary_size() ) )
{ error_ = bad_dict_msg; retval_ = 2; return true; }
return false;
}
void Lzip_index::set_errno_error( const char * const msg ) void Lzip_index::set_errno_error( const char * const msg )
{ {
error_ = msg; error_ += std::strerror( errno ); error_ = msg; error_ += std::strerror( errno );
@ -59,14 +71,24 @@ void Lzip_index::set_num_error( const char * const msg, unsigned long long num )
} }
// If successful, push last member and set pos to member header. bool Lzip_index::read_header( const int fd, Lzip_header & header,
bool Lzip_index::skip_trailing_data( const int fd, long long & pos, const long long pos )
const bool ignore_trailing, const bool loose_trailing )
{ {
if( seek_read( fd, header.data, Lzip_header::size, pos ) != Lzip_header::size )
{ set_errno_error( "Error reading member header: " ); return false; }
return true;
}
// If successful, push last member and set pos to member header.
bool Lzip_index::skip_trailing_data( const int fd, unsigned long long & pos,
const bool ignore_trailing,
const bool loose_trailing )
{
if( pos < min_member_size ) return false;
enum { block_size = 16384, enum { block_size = 16384,
buffer_size = block_size + Lzip_trailer::size - 1 + Lzip_header::size }; buffer_size = block_size + Lzip_trailer::size - 1 + Lzip_header::size };
uint8_t buffer[buffer_size]; uint8_t buffer[buffer_size];
if( pos < min_member_size ) return false;
int bsize = pos % block_size; // total bytes in buffer int bsize = pos % block_size; // total bytes in buffer
if( bsize <= buffer_size - block_size ) bsize += block_size; if( bsize <= buffer_size - block_size ) bsize += block_size;
int search_size = bsize; // bytes to search for trailer int search_size = bsize; // bytes to search for trailer
@ -89,26 +111,30 @@ bool Lzip_index::skip_trailing_data( const int fd, long long & pos,
if( member_size > ipos + i || !trailer.verify_consistency() ) if( member_size > ipos + i || !trailer.verify_consistency() )
continue; continue;
Lzip_header header; Lzip_header header;
if( seek_read( fd, header.data, Lzip_header::size, if( !read_header( fd, header, ipos + i - member_size ) ) return false;
ipos + i - member_size ) != Lzip_header::size ) if( !header.verify() ) continue;
{ set_errno_error( "Error reading member header: " ); return false; } const Lzip_header & header2 = *(const Lzip_header *)( buffer + i );
const unsigned dictionary_size = header.dictionary_size(); const bool full_h2 = bsize - i >= Lzip_header::size;
if( !header.verify_magic() || !header.verify_version() || if( header2.verify_prefix( bsize - i ) ) // last member
!isvalid_ds( dictionary_size ) ) continue; {
if( (*(const Lzip_header *)( buffer + i )).verify_prefix( bsize - i ) ) if( !full_h2 ) error_ = "Last member in input file is truncated.";
{ error_ = "Last member in input file is truncated or corrupt."; else if( !check_header_error( header2, false ) )
retval_ = 2; return false; } error_ = "Last member in input file is truncated or corrupt.";
if( !loose_trailing && bsize - i >= Lzip_header::size && retval_ = 2; return false;
(*(const Lzip_header *)( buffer + i )).verify_corrupt() ) }
if( !loose_trailing && full_h2 && header2.verify_corrupt() )
{ error_ = corrupt_mm_msg; retval_ = 2; return false; } { error_ = corrupt_mm_msg; retval_ = 2; return false; }
if( !ignore_trailing ) if( !ignore_trailing )
{ error_ = trailing_msg; retval_ = 2; return false; } { error_ = trailing_msg; retval_ = 2; return false; }
pos = ipos + i - member_size; pos = ipos + i - member_size;
const unsigned dictionary_size = header.dictionary_size();
member_vector.push_back( Member( 0, trailer.data_size(), pos, member_vector.push_back( Member( 0, trailer.data_size(), pos,
member_size, dictionary_size ) ); member_size, dictionary_size ) );
if( dictionary_size_ < dictionary_size )
dictionary_size_ = dictionary_size;
return true; return true;
} }
if( ipos <= 0 ) if( ipos == 0 )
{ set_num_error( "Bad trailer at pos ", pos - Lzip_trailer::size ); { set_num_error( "Bad trailer at pos ", pos - Lzip_trailer::size );
return false; } return false; }
bsize = buffer_size; bsize = buffer_size;
@ -122,7 +148,8 @@ bool Lzip_index::skip_trailing_data( const int fd, long long & pos,
Lzip_index::Lzip_index( const int infd, const bool ignore_trailing, Lzip_index::Lzip_index( const int infd, const bool ignore_trailing,
const bool loose_trailing ) const bool loose_trailing )
: insize( lseek( infd, 0, SEEK_END ) ), retval_( 0 ) : insize( lseek( infd, 0, SEEK_END ) ), retval_( 0 ), dictionary_size_( 0 ),
bad_magic_( false )
{ {
if( insize < 0 ) if( insize < 0 )
{ set_errno_error( "Input file is not seekable: " ); return; } { set_errno_error( "Input file is not seekable: " ); return; }
@ -133,16 +160,10 @@ Lzip_index::Lzip_index( const int infd, const bool ignore_trailing,
retval_ = 2; return; } retval_ = 2; return; }
Lzip_header header; Lzip_header header;
if( seek_read( infd, header.data, Lzip_header::size, 0 ) != Lzip_header::size ) if( !read_header( infd, header, 0 ) ) return;
{ set_errno_error( "Error reading member header: " ); return; } if( check_header_error( header, true ) ) return;
if( !header.verify_magic() )
{ error_ = bad_magic_msg; retval_ = 2; return; }
if( !header.verify_version() )
{ error_ = bad_version( header.version() ); retval_ = 2; return; }
if( !isvalid_ds( header.dictionary_size() ) )
{ error_ = bad_dict_msg; retval_ = 2; return; }
long long pos = insize; // always points to a header or to EOF unsigned long long pos = insize; // always points to a header or to EOF
while( pos >= min_member_size ) while( pos >= min_member_size )
{ {
Lzip_trailer trailer; Lzip_trailer trailer;
@ -150,7 +171,7 @@ Lzip_index::Lzip_index( const int infd, const bool ignore_trailing,
pos - Lzip_trailer::size ) != Lzip_trailer::size ) pos - Lzip_trailer::size ) != Lzip_trailer::size )
{ set_errno_error( "Error reading member trailer: " ); break; } { set_errno_error( "Error reading member trailer: " ); break; }
const unsigned long long member_size = trailer.member_size(); const unsigned long long member_size = trailer.member_size();
if( member_size > (unsigned long long)pos || !trailer.verify_consistency() ) if( member_size > pos || !trailer.verify_consistency() ) // bad trailer
{ {
if( member_vector.empty() ) if( member_vector.empty() )
{ if( skip_trailing_data( infd, pos, ignore_trailing, loose_trailing ) ) { if( skip_trailing_data( infd, pos, ignore_trailing, loose_trailing ) )
@ -158,12 +179,8 @@ Lzip_index::Lzip_index( const int infd, const bool ignore_trailing,
set_num_error( "Bad trailer at pos ", pos - Lzip_trailer::size ); set_num_error( "Bad trailer at pos ", pos - Lzip_trailer::size );
break; break;
} }
if( seek_read( infd, header.data, Lzip_header::size, if( !read_header( infd, header, pos - member_size ) ) break;
pos - member_size ) != Lzip_header::size ) if( !header.verify() ) // bad header
{ set_errno_error( "Error reading member header: " ); break; }
const unsigned dictionary_size = header.dictionary_size();
if( !header.verify_magic() || !header.verify_version() ||
!isvalid_ds( dictionary_size ) )
{ {
if( member_vector.empty() ) if( member_vector.empty() )
{ if( skip_trailing_data( infd, pos, ignore_trailing, loose_trailing ) ) { if( skip_trailing_data( infd, pos, ignore_trailing, loose_trailing ) )
@ -172,8 +189,11 @@ Lzip_index::Lzip_index( const int infd, const bool ignore_trailing,
break; break;
} }
pos -= member_size; pos -= member_size;
const unsigned dictionary_size = header.dictionary_size();
member_vector.push_back( Member( 0, trailer.data_size(), pos, member_vector.push_back( Member( 0, trailer.data_size(), pos,
member_size, dictionary_size ) ); member_size, dictionary_size ) );
if( dictionary_size_ < dictionary_size )
dictionary_size_ = dictionary_size;
} }
if( pos != 0 || member_vector.empty() ) if( pos != 0 || member_vector.empty() )
{ {

View file

@ -1,5 +1,5 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -52,10 +52,14 @@ class Lzip_index
std::string error_; std::string error_;
const long long insize; const long long insize;
int retval_; int retval_;
unsigned dictionary_size_; // largest dictionary size in the file
bool bad_magic_; // bad magic in first header
bool check_header_error( const Lzip_header & header, const bool first );
void set_errno_error( const char * const msg ); void set_errno_error( const char * const msg );
void set_num_error( const char * const msg, unsigned long long num ); void set_num_error( const char * const msg, unsigned long long num );
bool skip_trailing_data( const int fd, long long & pos, bool read_header( const int fd, Lzip_header & header, const long long pos );
bool skip_trailing_data( const int fd, unsigned long long & pos,
const bool ignore_trailing, const bool loose_trailing ); const bool ignore_trailing, const bool loose_trailing );
public: public:
@ -65,6 +69,8 @@ public:
long members() const { return member_vector.size(); } long members() const { return member_vector.size(); }
const std::string & error() const { return error_; } const std::string & error() const { return error_; }
int retval() const { return retval_; } int retval() const { return retval_; }
unsigned dictionary_size() const { return dictionary_size_; }
bool bad_magic() const { return bad_magic_; }
long long udata_size() const long long udata_size() const
{ if( member_vector.empty() ) return 0; { if( member_vector.empty() ) return 0;

295
main.cc
View file

@ -1,6 +1,6 @@
/* Plzip - Massively parallel implementation of lzip /* Plzip - Massively parallel implementation of lzip
Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009 Laszlo Ersek.
Copyright (C) 2009-2019 Antonio Diaz Diaz. Copyright (C) 2009-2021 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -34,7 +34,6 @@
#include <string> #include <string>
#include <vector> #include <vector>
#include <fcntl.h> #include <fcntl.h>
#include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <unistd.h> #include <unistd.h>
#include <utime.h> #include <utime.h>
@ -73,8 +72,8 @@ int verbosity = 0;
namespace { namespace {
const char * const program_name = "plzip"; const char * const program_name = "plzip";
const char * const program_year = "2019"; const char * const program_year = "2021";
const char * invocation_name = 0; const char * invocation_name = program_name; // default value
const struct { const char * from; const char * to; } known_extensions[] = { const struct { const char * from; const char * to; } known_extensions[] = {
{ ".lz", "" }, { ".lz", "" },
@ -99,20 +98,22 @@ bool delete_output_on_interrupt = false;
void show_help( const long num_online ) void show_help( const long num_online )
{ {
std::printf( "Plzip is a massively parallel (multi-threaded) implementation of lzip, fully\n" std::printf( "Plzip is a massively parallel (multi-threaded) implementation of lzip, fully\n"
"compatible with lzip 1.4 or newer. Plzip uses the lzlib compression library.\n" "compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.\n"
"\nLzip is a lossless data compressor with a user interface similar to the\n" "\nLzip is a lossless data compressor with a user interface similar to the one\n"
"one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0)\n" "of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov\n"
"or compress most files more than bzip2 (lzip -9). Decompression speed is\n" "chain-Algorithm' (LZMA) stream format, chosen to maximize safety and\n"
"intermediate between gzip and bzip2. Lzip is better than gzip and bzip2\n" "interoperability. Lzip can compress about as fast as gzip (lzip -0) or\n"
"from a data recovery perspective. Lzip has been designed, written and\n" "compress most files more than bzip2 (lzip -9). Decompression speed is\n"
"tested with great care to replace gzip and bzip2 as the standard\n" "intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from\n"
"general-purpose compressed format for unix-like systems.\n" "a data recovery perspective. Lzip has been designed, written, and tested\n"
"\nPlzip can compress/decompress large files on multiprocessor machines\n" "with great care to replace gzip and bzip2 as the standard general-purpose\n"
"much faster than lzip, at the cost of a slightly reduced compression\n" "compressed format for unix-like systems.\n"
"ratio (0.4 to 2 percent larger compressed files). Note that the number\n" "\nPlzip can compress/decompress large files on multiprocessor machines much\n"
"of usable threads is limited by file size; on files larger than a few GB\n" "faster than lzip, at the cost of a slightly reduced compression ratio (0.4\n"
"plzip can use hundreds of processors, but on files of only a few MB\n" "to 2 percent larger compressed files). Note that the number of usable\n"
"plzip is no faster than lzip.\n" "threads is limited by file size; on files larger than a few GB plzip can use\n"
"hundreds of processors, but on files of only a few MB plzip is no faster\n"
"than lzip.\n"
"\nUsage: %s [options] [files]\n", invocation_name ); "\nUsage: %s [options] [files]\n", invocation_name );
std::printf( "\nOptions:\n" std::printf( "\nOptions:\n"
" -h, --help display this help and exit\n" " -h, --help display this help and exit\n"
@ -127,7 +128,7 @@ void show_help( const long num_online )
" -l, --list print (un)compressed file sizes\n" " -l, --list print (un)compressed file sizes\n"
" -m, --match-length=<bytes> set match length limit in bytes [36]\n" " -m, --match-length=<bytes> set match length limit in bytes [36]\n"
" -n, --threads=<n> set number of (de)compression threads [%ld]\n" " -n, --threads=<n> set number of (de)compression threads [%ld]\n"
" -o, --output=<file> if reading standard input, write to <file>\n" " -o, --output=<file> write to <file>, keep input files\n"
" -q, --quiet suppress all messages\n" " -q, --quiet suppress all messages\n"
" -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n" " -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n"
" -t, --test test compressed file integrity\n" " -t, --test test compressed file integrity\n"
@ -138,12 +139,13 @@ void show_help( const long num_online )
" --loose-trailing allow trailing data seeming corrupt header\n" " --loose-trailing allow trailing data seeming corrupt header\n"
" --in-slots=<n> number of 1 MiB input packets buffered [4]\n" " --in-slots=<n> number of 1 MiB input packets buffered [4]\n"
" --out-slots=<n> number of 1 MiB output packets buffered [64]\n" " --out-slots=<n> number of 1 MiB output packets buffered [64]\n"
, num_online ); " --check-lib compare version of lzlib.h with liblz.{a,so}\n",
num_online );
if( verbosity >= 1 ) if( verbosity >= 1 )
{ {
std::printf( " --debug=<level> (0-1) print debug statistics to stderr\n" ); std::printf( " --debug=<level> print mode(2), debug statistics(1) to stderr\n" );
} }
std::printf( "If no file names are given, or if a file is '-', plzip compresses or\n" std::printf( "\nIf no file names are given, or if a file is '-', plzip compresses or\n"
"decompresses from standard input to standard output.\n" "decompresses from standard input to standard output.\n"
"Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n" "Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n"
"Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n" "Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n"
@ -151,8 +153,10 @@ void show_help( const long num_online )
"to 2^29 bytes.\n" "to 2^29 bytes.\n"
"\nThe bidimensional parameter space of LZMA can't be mapped to a linear\n" "\nThe bidimensional parameter space of LZMA can't be mapped to a linear\n"
"scale optimal for all files. If your files are large, very repetitive,\n" "scale optimal for all files. If your files are large, very repetitive,\n"
"etc, you may need to use the --dictionary-size and --match-length\n" "etc, you may need to use the options --dictionary-size and --match-length\n"
"options directly to achieve optimal performance.\n" "directly to achieve optimal performance.\n"
"\nTo extract all the files from archive 'foo.tar.lz', use the commands\n"
"'tar -xf foo.tar.lz' or 'plzip -cd foo.tar.lz | tar -xf -'.\n"
"\nExit status: 0 for a normal exit, 1 for environmental problems (file\n" "\nExit status: 0 for a normal exit, 1 for environmental problems (file\n"
"not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or\n" "not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or\n"
"invalid input file, 3 for an internal consistency error (eg, bug) which\n" "invalid input file, 3 for an internal consistency error (eg, bug) which\n"
@ -173,6 +177,37 @@ void show_version()
"There is NO WARRANTY, to the extent permitted by law.\n" ); "There is NO WARRANTY, to the extent permitted by law.\n" );
} }
int check_lib()
{
bool warning = false;
if( std::strcmp( LZ_version_string, LZ_version() ) != 0 )
{ warning = true;
if( verbosity >= 0 )
std::printf( "warning: LZ_version_string != LZ_version() (%s vs %s)\n",
LZ_version_string, LZ_version() ); }
#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
if( LZ_API_VERSION != LZ_api_version() )
{ warning = true;
if( verbosity >= 0 )
std::printf( "warning: LZ_API_VERSION != LZ_api_version() (%u vs %u)\n",
LZ_API_VERSION, LZ_api_version() ); }
#endif
if( verbosity >= 1 )
{
std::printf( "Using lzlib %s\n", LZ_version() );
#if !defined LZ_API_VERSION
std::fputs( "LZ_API_VERSION is not defined.\n", stdout );
#elif LZ_API_VERSION >= 1012
std::printf( "Using LZ_API_VERSION = %u\n", LZ_api_version() );
#else
std::printf( "Compiled with LZ_API_VERSION = %u. "
"Using an unknown LZ_API_VERSION\n", LZ_API_VERSION );
#endif
}
return warning;
}
} // end namespace } // end namespace
void Pretty_print::operator()( const char * const msg ) const void Pretty_print::operator()( const char * const msg ) const
@ -220,7 +255,7 @@ const char * format_ds( const unsigned dictionary_size )
void show_header( const unsigned dictionary_size ) void show_header( const unsigned dictionary_size )
{ {
std::fprintf( stderr, "dictionary %s, ", format_ds( dictionary_size ) ); std::fprintf( stderr, "dict %s, ", format_ds( dictionary_size ) );
} }
namespace { namespace {
@ -313,10 +348,14 @@ int extension_index( const std::string & name )
} }
void set_c_outname( const std::string & name, const bool force_ext ) void set_c_outname( const std::string & name, const bool filenames_given,
const bool force_ext )
{ {
/* zupdate < 1.9 depends on lzip adding the extension '.lz' to name when
reading from standard input. */
output_filename = name; output_filename = name;
if( force_ext || extension_index( output_filename ) < 0 ) if( force_ext ||
( !filenames_given && extension_index( output_filename ) < 0 ) )
output_filename += known_extensions[0].from; output_filename += known_extensions[0].from;
} }
@ -342,7 +381,7 @@ void set_d_outname( const std::string & name, const int eindex )
} // end namespace } // end namespace
int open_instream( const char * const name, struct stat * const in_statsp, int open_instream( const char * const name, struct stat * const in_statsp,
const bool no_ofile, const bool reg_only ) const bool one_to_one, const bool reg_only )
{ {
int infd = open( name, O_RDONLY | O_BINARY ); int infd = open( name, O_RDONLY | O_BINARY );
if( infd < 0 ) if( infd < 0 )
@ -354,13 +393,12 @@ int open_instream( const char * const name, struct stat * const in_statsp,
const bool can_read = ( i == 0 && !reg_only && const bool can_read = ( i == 0 && !reg_only &&
( S_ISBLK( mode ) || S_ISCHR( mode ) || ( S_ISBLK( mode ) || S_ISCHR( mode ) ||
S_ISFIFO( mode ) || S_ISSOCK( mode ) ) ); S_ISFIFO( mode ) || S_ISSOCK( mode ) ) );
if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || !no_ofile ) ) ) if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || one_to_one ) ) )
{ {
if( verbosity >= 0 ) if( verbosity >= 0 )
std::fprintf( stderr, "%s: Input file '%s' is not a regular file%s.\n", std::fprintf( stderr, "%s: Input file '%s' is not a regular file%s.\n",
program_name, name, program_name, name, ( can_read && one_to_one ) ?
( can_read && !no_ofile ) ? ",\n and neither '-c' nor '-o' were specified" : "" );
",\n and '--stdout' was not specified" : "" );
close( infd ); close( infd );
infd = -1; infd = -1;
} }
@ -372,7 +410,7 @@ namespace {
int open_instream2( const char * const name, struct stat * const in_statsp, int open_instream2( const char * const name, struct stat * const in_statsp,
const Mode program_mode, const int eindex, const Mode program_mode, const int eindex,
const bool recompress, const bool to_stdout ) const bool one_to_one, const bool recompress )
{ {
if( program_mode == m_compress && !recompress && eindex >= 0 ) if( program_mode == m_compress && !recompress && eindex >= 0 )
{ {
@ -381,16 +419,15 @@ int open_instream2( const char * const name, struct stat * const in_statsp,
program_name, name, known_extensions[eindex].from ); program_name, name, known_extensions[eindex].from );
return -1; return -1;
} }
const bool no_ofile = ( to_stdout || program_mode == m_test ); return open_instream( name, in_statsp, one_to_one, false );
return open_instream( name, in_statsp, no_ofile, false );
} }
bool open_outstream( const bool force, const bool from_stdin ) bool open_outstream( const bool force, const bool protect )
{ {
const mode_t usr_rw = S_IRUSR | S_IWUSR; const mode_t usr_rw = S_IRUSR | S_IWUSR;
const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH;
const mode_t outfd_mode = from_stdin ? all_rw : usr_rw; const mode_t outfd_mode = protect ? usr_rw : all_rw;
int flags = O_CREAT | O_WRONLY | O_BINARY; int flags = O_CREAT | O_WRONLY | O_BINARY;
if( force ) flags |= O_TRUNC; else flags |= O_EXCL; if( force ) flags |= O_TRUNC; else flags |= O_EXCL;
@ -409,25 +446,6 @@ bool open_outstream( const bool force, const bool from_stdin )
} }
bool check_tty( const char * const input_filename, const int infd,
const Mode program_mode )
{
if( program_mode == m_compress && isatty( outfd ) )
{
show_error( "I won't write compressed data to a terminal.", 0, true );
return false;
}
if( ( program_mode == m_decompress || program_mode == m_test ) &&
isatty( infd ) )
{
show_file_error( input_filename,
"I won't read compressed data from a terminal." );
return false;
}
return true;
}
void set_signals( void (*action)(int) ) void set_signals( void (*action)(int) )
{ {
std::signal( SIGHUP, action ); std::signal( SIGHUP, action );
@ -437,10 +455,10 @@ void set_signals( void (*action)(int) )
} // end namespace } // end namespace
// This can be called from any thread, main thread or sub-threads alike, /* This can be called from any thread, main thread or sub-threads alike,
// since they all call common helper functions that call cleanup_and_fail() since they all call common helper functions like 'xlock' that call
// in case of an error. cleanup_and_fail() in case of an error.
// */
void cleanup_and_fail( const int retval ) void cleanup_and_fail( const int retval )
{ {
// only one thread can delete and exit // only one thread can delete and exit
@ -474,7 +492,31 @@ extern "C" void signal_handler( int )
} }
// Set permissions, owner and times. bool check_tty_in( const char * const input_filename, const int infd,
const Mode program_mode, int & retval )
{
if( ( program_mode == m_decompress || program_mode == m_test ) &&
isatty( infd ) ) // for example /dev/tty
{ show_file_error( input_filename,
"I won't read compressed data from a terminal." );
close( infd ); set_retval( retval, 1 );
if( program_mode != m_test ) cleanup_and_fail( retval );
return false; }
return true;
}
bool check_tty_out( const Mode program_mode )
{
if( program_mode == m_compress && isatty( outfd ) )
{ show_file_error( output_filename.size() ?
output_filename.c_str() : "(stdout)",
"I won't write compressed data to a terminal." );
return false; }
return true;
}
// Set permissions, owner, and times.
void close_and_set_permissions( const struct stat * const in_statsp ) void close_and_set_permissions( const struct stat * const in_statsp )
{ {
bool warning = false; bool warning = false;
@ -622,13 +664,9 @@ int main( const int argc, const char * const argv[] )
bool loose_trailing = false; bool loose_trailing = false;
bool recompress = false; bool recompress = false;
bool to_stdout = false; bool to_stdout = false;
invocation_name = argv[0]; if( argc > 0 ) invocation_name = argv[0];
if( LZ_version()[0] < '1' ) enum { opt_chk = 256, opt_dbg, opt_in, opt_lt, opt_out };
{ show_error( "Bad library version. At least lzlib 1.0 is required." );
return 1; }
enum { opt_dbg = 256, opt_in, opt_lt, opt_out };
const Arg_parser::Option options[] = const Arg_parser::Option options[] =
{ {
{ '0', "fast", Arg_parser::no }, { '0', "fast", Arg_parser::no },
@ -660,11 +698,12 @@ int main( const int argc, const char * const argv[] )
{ 't', "test", Arg_parser::no }, { 't', "test", Arg_parser::no },
{ 'v', "verbose", Arg_parser::no }, { 'v', "verbose", Arg_parser::no },
{ 'V', "version", Arg_parser::no }, { 'V', "version", Arg_parser::no },
{ opt_chk, "check-lib", Arg_parser::no },
{ opt_dbg, "debug", Arg_parser::yes }, { opt_dbg, "debug", Arg_parser::yes },
{ opt_in, "in-slots", Arg_parser::yes }, { opt_in, "in-slots", Arg_parser::yes },
{ opt_lt, "loose-trailing", Arg_parser::no }, { opt_lt, "loose-trailing", Arg_parser::no },
{ opt_out, "out-slots", Arg_parser::yes }, { opt_out, "out-slots", Arg_parser::yes },
{ 0 , 0, Arg_parser::no } }; { 0, 0, Arg_parser::no } };
const Arg_parser parser( argc, argv, options ); const Arg_parser parser( argc, argv, options );
if( parser.error().size() ) // bad option if( parser.error().size() ) // bad option
@ -702,7 +741,8 @@ int main( const int argc, const char * const argv[] )
getnum( arg, LZ_min_match_len_limit(), getnum( arg, LZ_min_match_len_limit(),
LZ_max_match_len_limit() ); break; LZ_max_match_len_limit() ); break;
case 'n': num_workers = getnum( arg, 1, max_workers ); break; case 'n': num_workers = getnum( arg, 1, max_workers ); break;
case 'o': default_output_filename = sarg; break; case 'o': if( sarg == "-" ) to_stdout = true;
else { default_output_filename = sarg; } break;
case 'q': verbosity = -1; break; case 'q': verbosity = -1; break;
case 's': encoder_options.dictionary_size = get_dict_size( arg ); case 's': encoder_options.dictionary_size = get_dict_size( arg );
break; break;
@ -710,6 +750,7 @@ int main( const int argc, const char * const argv[] )
case 't': set_mode( program_mode, m_test ); break; case 't': set_mode( program_mode, m_test ); break;
case 'v': if( verbosity < 4 ) ++verbosity; break; case 'v': if( verbosity < 4 ) ++verbosity; break;
case 'V': show_version(); return 0; case 'V': show_version(); return 0;
case opt_chk: return check_lib();
case opt_dbg: debug_level = getnum( arg, 0, 3 ); break; case opt_dbg: debug_level = getnum( arg, 0, 3 ); break;
case opt_in: in_slots = getnum( arg, 1, 64 ); break; case opt_in: in_slots = getnum( arg, 1, 64 ); break;
case opt_lt: loose_trailing = true; break; case opt_lt: loose_trailing = true; break;
@ -718,6 +759,10 @@ int main( const int argc, const char * const argv[] )
} }
} // end process options } // end process options
if( LZ_version()[0] < '1' )
{ show_error( "Wrong library version. At least lzlib 1.0 is required." );
return 1; }
#if defined(__MSVCRT__) || defined(__OS2__) #if defined(__MSVCRT__) || defined(__OS2__)
setmode( STDIN_FILENO, O_BINARY ); setmode( STDIN_FILENO, O_BINARY );
setmode( STDOUT_FILENO, O_BINARY ); setmode( STDOUT_FILENO, O_BINARY );
@ -734,9 +779,6 @@ int main( const int argc, const char * const argv[] )
if( program_mode == m_list ) if( program_mode == m_list )
return list_files( filenames, ignore_trailing, loose_trailing ); return list_files( filenames, ignore_trailing, loose_trailing );
if( program_mode == m_test )
outfd = -1;
const bool fast = encoder_options.dictionary_size == 65535 && const bool fast = encoder_options.dictionary_size == 65535 &&
encoder_options.match_len_limit == 16; encoder_options.match_len_limit == 16;
if( data_size <= 0 ) if( data_size <= 0 )
@ -762,112 +804,99 @@ int main( const int argc, const char * const argv[] )
num_workers = std::min( num_online, max_workers ); num_workers = std::min( num_online, max_workers );
} }
if( !to_stdout && program_mode != m_test && if( program_mode == m_test ) to_stdout = false; // apply overrides
( filenames_given || default_output_filename.size() ) ) if( program_mode == m_test || to_stdout ) default_output_filename.clear();
if( to_stdout && program_mode != m_test ) // check tty only once
{ outfd = STDOUT_FILENO; if( !check_tty_out( program_mode ) ) return 1; }
else outfd = -1;
const bool to_file = !to_stdout && program_mode != m_test &&
default_output_filename.size();
if( !to_stdout && program_mode != m_test && ( filenames_given || to_file ) )
set_signals( signal_handler ); set_signals( signal_handler );
Pretty_print pp( filenames ); Pretty_print pp( filenames );
int failed_tests = 0; int failed_tests = 0;
int retval = 0; int retval = 0;
const bool one_to_one = !to_stdout && program_mode != m_test && !to_file;
bool stdin_used = false; bool stdin_used = false;
for( unsigned i = 0; i < filenames.size(); ++i ) for( unsigned i = 0; i < filenames.size(); ++i )
{ {
std::string input_filename; std::string input_filename;
int infd; int infd;
struct stat in_stats; struct stat in_stats;
output_filename.clear();
if( filenames[i].empty() || filenames[i] == "-" ) pp.set_name( filenames[i] );
if( filenames[i] == "-" )
{ {
if( stdin_used ) continue; else stdin_used = true; if( stdin_used ) continue; else stdin_used = true;
infd = STDIN_FILENO; infd = STDIN_FILENO;
if( program_mode != m_test ) if( !check_tty_in( pp.name(), infd, program_mode, retval ) ) continue;
{ if( one_to_one ) { outfd = STDOUT_FILENO; output_filename.clear(); }
if( to_stdout || default_output_filename.empty() )
outfd = STDOUT_FILENO;
else
{
if( program_mode == m_compress )
set_c_outname( default_output_filename, false );
else output_filename = default_output_filename;
if( !open_outstream( force, true ) )
{
if( retval < 1 ) retval = 1;
close( infd );
continue;
}
}
}
} }
else else
{ {
const int eindex = extension_index( input_filename = filenames[i] ); const int eindex = extension_index( input_filename = filenames[i] );
infd = open_instream2( input_filename.c_str(), &in_stats, program_mode, infd = open_instream2( input_filename.c_str(), &in_stats, program_mode,
eindex, recompress, to_stdout ); eindex, one_to_one, recompress );
if( infd < 0 ) { if( retval < 1 ) retval = 1; continue; } if( infd < 0 ) { set_retval( retval, 1 ); continue; }
if( program_mode != m_test ) if( !check_tty_in( pp.name(), infd, program_mode, retval ) ) continue;
{ if( one_to_one ) // open outfd after verifying infd
if( to_stdout ) outfd = STDOUT_FILENO;
else
{ {
if( program_mode == m_compress ) if( program_mode == m_compress )
set_c_outname( input_filename, true ); set_c_outname( input_filename, true, true );
else set_d_outname( input_filename, eindex ); else set_d_outname( input_filename, eindex );
if( !open_outstream( force, false ) ) if( !open_outstream( force, true ) )
{ { close( infd ); set_retval( retval, 1 ); continue; }
if( retval < 1 ) retval = 1;
close( infd );
continue;
}
}
} }
} }
pp.set_name( input_filename ); if( one_to_one && !check_tty_out( program_mode ) )
if( !check_tty( pp.name(), infd, program_mode ) ) { set_retval( retval, 1 ); return retval; } // don't delete a tty
if( to_file && outfd < 0 ) // open outfd after verifying infd
{ {
if( retval < 1 ) retval = 1; if( program_mode == m_compress ) set_c_outname( default_output_filename,
if( program_mode == m_test ) { close( infd ); continue; } filenames_given, false );
cleanup_and_fail( retval ); else output_filename = default_output_filename;
if( !open_outstream( force, false ) || !check_tty_out( program_mode ) )
return 1; // check tty only once and don't try to delete a tty
} }
const struct stat * const in_statsp = input_filename.size() ? &in_stats : 0; const struct stat * const in_statsp =
const bool infd_isreg = in_statsp && S_ISREG( in_statsp->st_mode ); ( input_filename.size() && one_to_one ) ? &in_stats : 0;
const bool infd_isreg = input_filename.size() && S_ISREG( in_stats.st_mode );
const unsigned long long cfile_size = const unsigned long long cfile_size =
infd_isreg ? ( in_statsp->st_size + 99 ) / 100 : 0; infd_isreg ? ( in_stats.st_size + 99 ) / 100 : 0;
int tmp; int tmp;
if( program_mode == m_compress ) if( program_mode == m_compress )
tmp = compress( cfile_size, data_size, encoder_options.dictionary_size, tmp = compress( cfile_size, data_size, encoder_options.dictionary_size,
encoder_options.match_len_limit, encoder_options.match_len_limit, num_workers,
num_workers, infd, outfd, pp, debug_level ); infd, outfd, pp, debug_level );
else else
tmp = decompress( cfile_size, num_workers, infd, outfd, pp, debug_level, tmp = decompress( cfile_size, num_workers, infd, outfd, pp,
in_slots, out_slots, ignore_trailing, loose_trailing, debug_level, in_slots, out_slots, ignore_trailing,
infd_isreg ); loose_trailing, infd_isreg, one_to_one );
if( close( infd ) != 0 ) if( close( infd ) != 0 )
{ { show_file_error( pp.name(), "Error closing input file", errno );
show_error( input_filename.size() ? "Error closing input file" : set_retval( tmp, 1 ); }
"Error closing stdin", errno ); set_retval( retval, tmp );
if( tmp < 1 ) tmp = 1;
}
if( tmp > retval ) retval = tmp;
if( tmp ) if( tmp )
{ if( program_mode != m_test ) cleanup_and_fail( retval ); { if( program_mode != m_test ) cleanup_and_fail( retval );
else ++failed_tests; } else ++failed_tests; }
if( delete_output_on_interrupt ) if( delete_output_on_interrupt && one_to_one )
close_and_set_permissions( in_statsp ); close_and_set_permissions( in_statsp );
if( input_filename.size() ) if( input_filename.size() && !keep_input_files && one_to_one )
{
if( !keep_input_files && !to_stdout && program_mode != m_test )
std::remove( input_filename.c_str() ); std::remove( input_filename.c_str() );
} }
} if( delete_output_on_interrupt ) close_and_set_permissions( 0 ); // -o
if( outfd >= 0 && close( outfd ) != 0 ) else if( outfd >= 0 && close( outfd ) != 0 ) // -c
{ {
show_error( "Error closing stdout", errno ); show_error( "Error closing stdout", errno );
if( retval < 1 ) retval = 1; set_retval( retval, 1 );
} }
if( failed_tests > 0 && verbosity >= 1 && filenames.size() > 1 ) if( failed_tests > 0 && verbosity >= 1 && filenames.size() > 1 )
std::fprintf( stderr, "%s: warning: %d %s failed the test.\n", std::fprintf( stderr, "%s: warning: %d %s failed the test.\n",

View file

@ -1,9 +1,9 @@
#! /bin/sh #! /bin/sh
# check script for Plzip - Massively parallel implementation of lzip # check script for Plzip - Massively parallel implementation of lzip
# Copyright (C) 2009-2019 Antonio Diaz Diaz. # Copyright (C) 2009-2021 Antonio Diaz Diaz.
# #
# This script is free software: you have unlimited permission # This script is free software: you have unlimited permission
# to copy, distribute and modify it. # to copy, distribute, and modify it.
LC_ALL=C LC_ALL=C
export LC_ALL export LC_ALL
@ -30,6 +30,7 @@ cd "${objdir}"/tmp || framework_failure
cat "${testdir}"/test.txt > in || framework_failure cat "${testdir}"/test.txt > in || framework_failure
in_lz="${testdir}"/test.txt.lz in_lz="${testdir}"/test.txt.lz
in_em="${testdir}"/test_em.txt.lz
fail=0 fail=0
lwarn8=0 lwarn8=0
lwarn10=0 lwarn10=0
@ -41,6 +42,7 @@ lzlib_1_10() { [ ${lwarn10} = 0 ] &&
printf "\nwarning: header HD=3 detection requires lzlib 1.10 or newer" printf "\nwarning: header HD=3 detection requires lzlib 1.10 or newer"
lwarn10=1 ; } lwarn10=1 ; }
"${LZIP}" --check-lib # just print warning
printf "testing plzip-%s..." "$2" printf "testing plzip-%s..." "$2"
"${LZIP}" -fkqm4 in "${LZIP}" -fkqm4 in
@ -66,6 +68,14 @@ done
[ $? = 2 ] || test_failed $LINENO [ $? = 2 ] || test_failed $LINENO
"${LZIP}" -dq -o in < "${in_lz}" "${LZIP}" -dq -o in < "${in_lz}"
[ $? = 1 ] || test_failed $LINENO [ $? = 1 ] || test_failed $LINENO
"${LZIP}" -dq -o in "${in_lz}"
[ $? = 1 ] || test_failed $LINENO
"${LZIP}" -dq -o out nx_file.lz
[ $? = 1 ] || test_failed $LINENO
[ ! -e out ] || test_failed $LINENO
"${LZIP}" -q -o out.lz nx_file
[ $? = 1 ] || test_failed $LINENO
[ ! -e out.lz ] || test_failed $LINENO
# these are for code coverage # these are for code coverage
"${LZIP}" -lt "${in_lz}" 2> /dev/null "${LZIP}" -lt "${in_lz}" 2> /dev/null
[ $? = 1 ] || test_failed $LINENO [ $? = 1 ] || test_failed $LINENO
@ -73,7 +83,9 @@ done
[ $? = 1 ] || test_failed $LINENO [ $? = 1 ] || test_failed $LINENO
"${LZIP}" -cdt "${in_lz}" > out 2> /dev/null "${LZIP}" -cdt "${in_lz}" > out 2> /dev/null
[ $? = 1 ] || test_failed $LINENO [ $? = 1 ] || test_failed $LINENO
"${LZIP}" -t -- nx_file 2> /dev/null "${LZIP}" -t -- nx_file.lz 2> /dev/null
[ $? = 1 ] || test_failed $LINENO
"${LZIP}" -t "" < /dev/null 2> /dev/null
[ $? = 1 ] || test_failed $LINENO [ $? = 1 ] || test_failed $LINENO
"${LZIP}" --help > /dev/null || test_failed $LINENO "${LZIP}" --help > /dev/null || test_failed $LINENO
"${LZIP}" -n1 -V > /dev/null || test_failed $LINENO "${LZIP}" -n1 -V > /dev/null || test_failed $LINENO
@ -97,12 +109,26 @@ printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null
printf "\ntesting decompression..." printf "\ntesting decompression..."
"${LZIP}" -lq "${in_lz}" || test_failed $LINENO for i in "${in_lz}" "${in_em}" ; do
"${LZIP}" -t "${in_lz}" || test_failed $LINENO "${LZIP}" -lq "$i" || test_failed $LINENO "$i"
"${LZIP}" -cd "${in_lz}" > copy || test_failed $LINENO "${LZIP}" -t "$i" || test_failed $LINENO "$i"
cmp in copy || test_failed $LINENO "${LZIP}" -d "$i" -o copy || test_failed $LINENO "$i"
cmp in copy || test_failed $LINENO "$i"
"${LZIP}" -cd "$i" > copy || test_failed $LINENO "$i"
cmp in copy || test_failed $LINENO "$i"
"${LZIP}" -d "$i" -o - > copy || test_failed $LINENO "$i"
cmp in copy || test_failed $LINENO "$i"
"${LZIP}" -d < "$i" > copy || test_failed $LINENO "$i"
cmp in copy || test_failed $LINENO "$i"
rm -f copy || framework_failure
done
lines=$("${LZIP}" -tvv "${in_em}" 2>&1 | wc -l) || test_failed $LINENO
[ "${lines}" -eq 1 ] || test_failed $LINENO "${lines}"
lines=$("${LZIP}" -lvv "${in_em}" | wc -l) || test_failed $LINENO
[ "${lines}" -eq 11 ] || test_failed $LINENO "${lines}"
rm -f copy || framework_failure
cat "${in_lz}" > copy.lz || framework_failure cat "${in_lz}" > copy.lz || framework_failure
"${LZIP}" -dk copy.lz || test_failed $LINENO "${LZIP}" -dk copy.lz || test_failed $LINENO
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
@ -113,19 +139,19 @@ printf "to be overwritten" > copy || framework_failure
[ ! -e copy.lz ] || test_failed $LINENO [ ! -e copy.lz ] || test_failed $LINENO
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
rm -f copy || framework_failure
cat "${in_lz}" > copy.lz || framework_failure
"${LZIP}" -d -S100k copy.lz || test_failed $LINENO # ignore -S
[ ! -e copy.lz ] || test_failed $LINENO
cmp in copy || test_failed $LINENO
printf "to be overwritten" > copy || framework_failure printf "to be overwritten" > copy || framework_failure
"${LZIP}" -df -o copy < "${in_lz}" || test_failed $LINENO "${LZIP}" -df -o copy < "${in_lz}" || test_failed $LINENO
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
rm -f out copy || framework_failure
"${LZIP}" -d -o ./- "${in_lz}" || test_failed $LINENO
cmp in ./- || test_failed $LINENO
rm -f ./- || framework_failure
"${LZIP}" -d -o ./- < "${in_lz}" || test_failed $LINENO
cmp in ./- || test_failed $LINENO
rm -f ./- || framework_failure
rm -f copy || framework_failure cat "${in_lz}" > anyothername || framework_failure
"${LZIP}" < in > anyothername || test_failed $LINENO "${LZIP}" -dv - anyothername - < "${in_lz}" > copy 2> /dev/null ||
"${LZIP}" -dv --output copy - anyothername - < "${in_lz}" 2> /dev/null ||
test_failed $LINENO test_failed $LINENO
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
cmp in anyothername.out || test_failed $LINENO cmp in anyothername.out || test_failed $LINENO
@ -166,21 +192,19 @@ done
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
cat in in > in2 || framework_failure cat in in > in2 || framework_failure
cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure "${LZIP}" -lq "${in_lz}" "${in_lz}" || test_failed $LINENO
"${LZIP}" -lq in2.lz || test_failed $LINENO "${LZIP}" -t "${in_lz}" "${in_lz}" || test_failed $LINENO
"${LZIP}" -t in2.lz || test_failed $LINENO "${LZIP}" -cd "${in_lz}" "${in_lz}" -o out > copy2 || test_failed $LINENO
"${LZIP}" -cd in2.lz > copy2 || test_failed $LINENO [ ! -e out ] || test_failed $LINENO # override -o
cmp in2 copy2 || test_failed $LINENO cmp in2 copy2 || test_failed $LINENO
rm -f copy2 || framework_failure
"${LZIP}" --output=copy2.lz < in2 || test_failed $LINENO "${LZIP}" -d "${in_lz}" "${in_lz}" -o copy2 || test_failed $LINENO
"${LZIP}" -lq copy2.lz || test_failed $LINENO
"${LZIP}" -t copy2.lz || test_failed $LINENO
"${LZIP}" -cd copy2.lz > copy2 || test_failed $LINENO
cmp in2 copy2 || test_failed $LINENO cmp in2 copy2 || test_failed $LINENO
rm -f copy2 || framework_failure
cat "${in_lz}" "${in_lz}" > copy2.lz || framework_failure
printf "\ngarbage" >> copy2.lz || framework_failure printf "\ngarbage" >> copy2.lz || framework_failure
"${LZIP}" -tvvvv copy2.lz 2> /dev/null || test_failed $LINENO "${LZIP}" -tvvvv copy2.lz 2> /dev/null || test_failed $LINENO
rm -f copy2 || framework_failure
"${LZIP}" -alq copy2.lz "${LZIP}" -alq copy2.lz
[ $? = 2 ] || test_failed $LINENO [ $? = 2 ] || test_failed $LINENO
"${LZIP}" -atq copy2.lz "${LZIP}" -atq copy2.lz
@ -202,37 +226,46 @@ printf "\ntesting compression..."
"${LZIP}" -cf "${in_lz}" > out 2> /dev/null # /dev/null is a tty on OS/2 "${LZIP}" -cf "${in_lz}" > out 2> /dev/null # /dev/null is a tty on OS/2
[ $? = 1 ] || test_failed $LINENO [ $? = 1 ] || test_failed $LINENO
"${LZIP}" -cFvvm36 "${in_lz}" > out 2> /dev/null || test_failed $LINENO "${LZIP}" -Fvvm36 -o - "${in_lz}" > out 2> /dev/null || test_failed $LINENO
"${LZIP}" -cd out | "${LZIP}" -d > copy || test_failed $LINENO "${LZIP}" -cd out | "${LZIP}" -d > copy || test_failed $LINENO
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
"${LZIP}" -0 -o ./- in || test_failed $LINENO
"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO
rm -f ./- || framework_failure
"${LZIP}" -0 -o ./- < in || test_failed $LINENO # add .lz
[ ! -e ./- ] || test_failed $LINENO
"${LZIP}" -cd -- -.lz | cmp in - || test_failed $LINENO
rm -f ./-.lz || framework_failure
for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do
"${LZIP}" -k -$i in || test_failed $LINENO $i "${LZIP}" -k -$i in || test_failed $LINENO $i
mv -f in.lz copy.lz || test_failed $LINENO $i mv -f in.lz copy.lz || test_failed $LINENO $i
printf "garbage" >> copy.lz || framework_failure printf "garbage" >> copy.lz || framework_failure
"${LZIP}" -df copy.lz || test_failed $LINENO $i "${LZIP}" -df copy.lz || test_failed $LINENO $i
cmp in copy || test_failed $LINENO $i cmp in copy || test_failed $LINENO $i
done
for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do "${LZIP}" -$i in -c > out || test_failed $LINENO $i
"${LZIP}" -c -$i in > out || test_failed $LINENO $i "${LZIP}" -$i in -o o_out || test_failed $LINENO $i # don't add .lz
[ ! -e o_out.lz ] || test_failed $LINENO
cmp out o_out || test_failed $LINENO $i
rm -f o_out || framework_failure
printf "g" >> out || framework_failure printf "g" >> out || framework_failure
"${LZIP}" -cd out > copy || test_failed $LINENO $i "${LZIP}" -cd out > copy || test_failed $LINENO $i
cmp in copy || test_failed $LINENO $i cmp in copy || test_failed $LINENO $i
done
for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do
"${LZIP}" -$i < in > out || test_failed $LINENO $i "${LZIP}" -$i < in > out || test_failed $LINENO $i
"${LZIP}" -d < out > copy || test_failed $LINENO $i "${LZIP}" -d < out > copy || test_failed $LINENO $i
cmp in copy || test_failed $LINENO $i cmp in copy || test_failed $LINENO $i
done
for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do rm -f out || framework_failure
"${LZIP}" -f -$i -o out < in || test_failed $LINENO $i printf "to be overwritten" > out.lz || framework_failure
"${LZIP}" -f -$i -o out < in || test_failed $LINENO $i # add .lz
[ ! -e out ] || test_failed $LINENO
"${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i "${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i
cmp in copy || test_failed $LINENO $i cmp in copy || test_failed $LINENO $i
done done
rm -f out.lz || framework_failure rm -f out out.lz || framework_failure
cat in in in in > in4 || framework_failure cat in in in in > in4 || framework_failure
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; do for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; do
@ -317,6 +350,13 @@ else
fi fi
rm -f int.lz || framework_failure rm -f int.lz || framework_failure
for i in fox_v2.lz fox_s11.lz fox_de20.lz \
fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do
"${LZIP}" -tq "${testdir}"/$i
[ $? = 2 ] || test_failed $LINENO $i
done
cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure
cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure
if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null && if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null &&
[ -e trunc.lz ] && cmp in2.lz trunc.lz > /dev/null 2>&1 ; then [ -e trunc.lz ] && cmp in2.lz trunc.lz > /dev/null 2>&1 ; then
@ -343,14 +383,22 @@ printf "g" >> ingin.lz || framework_failure
cat "${in_lz}" >> ingin.lz || framework_failure cat "${in_lz}" >> ingin.lz || framework_failure
"${LZIP}" -lq ingin.lz "${LZIP}" -lq ingin.lz
[ $? = 2 ] || test_failed $LINENO [ $? = 2 ] || test_failed $LINENO
"${LZIP}" -atq ingin.lz
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -atq < ingin.lz
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -acdq ingin.lz > out
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -adq < ingin.lz > out
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -tq ingin.lz "${LZIP}" -tq ingin.lz
[ $? = 2 ] || test_failed $LINENO [ $? = 2 ] || test_failed $LINENO
"${LZIP}" -cdq ingin.lz > out
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -t < ingin.lz || test_failed $LINENO "${LZIP}" -t < ingin.lz || test_failed $LINENO
"${LZIP}" -cdq ingin.lz > copy
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -d < ingin.lz > copy || test_failed $LINENO "${LZIP}" -d < ingin.lz > copy || test_failed $LINENO
cmp in copy || test_failed $LINENO cmp in copy || test_failed $LINENO
rm -f copy ingin.lz || framework_failure rm -f copy ingin.lz out || framework_failure
echo echo
if [ ${fail} = 0 ] ; then if [ ${fail} = 0 ] ; then

BIN
testsuite/fox_bcrc.lz Normal file

Binary file not shown.

BIN
testsuite/fox_crc0.lz Normal file

Binary file not shown.

BIN
testsuite/fox_das46.lz Normal file

Binary file not shown.

BIN
testsuite/fox_de20.lz Normal file

Binary file not shown.

BIN
testsuite/fox_mes81.lz Normal file

Binary file not shown.

BIN
testsuite/fox_s11.lz Normal file

Binary file not shown.

BIN
testsuite/fox_v2.lz Normal file

Binary file not shown.

BIN
testsuite/test_em.txt.lz Normal file

Binary file not shown.