Merging upstream version 1.9.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
c7dcd442c7
commit
48c5ddf50f
29 changed files with 2003 additions and 1566 deletions
104
ChangeLog
104
ChangeLog
|
@ -1,15 +1,42 @@
|
|||
2021-01-03 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.9 released.
|
||||
* main.cc (main): Report an error if a file name is empty.
|
||||
Make '-o' behave like '-c', but writing to file instead of stdout.
|
||||
Make '-c' and '-o' check whether the output is a terminal only once.
|
||||
Do not open output if input is a terminal.
|
||||
* main.cc: New option '--check-lib'.
|
||||
* Replace 'decompressed', 'compressed' with 'out', 'in' in output.
|
||||
* decompress.cc, dec_stream.cc, dec_stdout.cc:
|
||||
Continue testing if any input file fails the test.
|
||||
Show the largest dictionary size in a multimember file.
|
||||
* main.cc: Show final diagnostic when testing multiple files.
|
||||
* decompress.cc, dec_stream.cc [LZ_API_VERSION >= 1012]: Avoid
|
||||
copying decompressed data when testing with lzlib 1.12 or newer.
|
||||
* compress.cc, dec_stream.cc: Start only the worker threads required.
|
||||
* dec_stream.cc: Splitter stops reading when trailing data is found.
|
||||
Don't include trailing data in the compressed size shown.
|
||||
Use plain comparison instead of Boyer-Moore to search for headers.
|
||||
* lzip_index.cc: Improve messages for corruption in last header.
|
||||
* decompress.cc: Shorten messages 'Data error' and 'Unexpected EOF'.
|
||||
* main.cc: Set a valid invocation_name even if argc == 0.
|
||||
* Document extraction from tar.lz in manual, '--help', and man page.
|
||||
* plzip.texi (Introduction): Mention tarlz as an alternative.
|
||||
* plzip.texi: Several fixes and improvements.
|
||||
* testsuite: Add 8 new test files.
|
||||
|
||||
2019-01-05 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.8 released.
|
||||
* File_* renamed to Lzip_*.
|
||||
* main.cc: Added new options '--in-slots' and '--out-slots'.
|
||||
* main.cc: Increased default in_slots per worker from 2 to 4.
|
||||
* main.cc: Increased default out_slots per worker from 32 to 64.
|
||||
* Rename File_* to Lzip_*.
|
||||
* main.cc: New options '--in-slots' and '--out-slots'.
|
||||
* main.cc: Increase default in_slots per worker from 2 to 4.
|
||||
* main.cc: Increase default out_slots per worker from 32 to 64.
|
||||
* lzip.h (Lzip_trailer): New function 'verify_consistency'.
|
||||
* lzip_index.cc: Detect some kinds of corrupt trailers.
|
||||
* main.cc (main): Check return value of close( infd ).
|
||||
* plzip.texi: Improved description of '-0..-9', '-m' and '-s'.
|
||||
* configure: Added new option '--with-mingw'.
|
||||
* plzip.texi: Improve description of '-0..-9', '-m', and '-s'.
|
||||
* configure: New option '--with-mingw'.
|
||||
* configure: Accept appending to CXXFLAGS, 'CXXFLAGS+=OPTIONS'.
|
||||
* INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
|
||||
|
||||
|
@ -20,19 +47,19 @@
|
|||
packet queue by a circular buffer to reduce memory fragmentation.
|
||||
* compress.cc: Return one empty packet at a time to reduce mem use.
|
||||
* main.cc: Reduce threads on 32 bit systems to use under 2.22 GiB.
|
||||
* main.cc: Added new option '--loose-trailing'.
|
||||
* Improved corrupt header detection to HD=3 on seekable files.
|
||||
* main.cc: New option '--loose-trailing'.
|
||||
* Improve corrupt header detection to HD = 3 on seekable files.
|
||||
(On all files with lzlib 1.10 or newer).
|
||||
* Replaced 'bits/byte' with inverse compression ratio in output.
|
||||
* Replace 'bits/byte' with inverse compression ratio in output.
|
||||
* Show progress of decompression at verbosity level 2 (-vv).
|
||||
* Show progress of (de)compression only if stderr is a terminal.
|
||||
* main.cc: Do not add a second .lz extension to the arg of -o.
|
||||
* Show dictionary size at verbosity level 4 (-vvvv).
|
||||
* main.cc (cleanup_and_fail): Suppress messages from other threads.
|
||||
* list.cc: Added missing '#include <pthread.h>'.
|
||||
* plzip.texi: Added chapter 'Output'.
|
||||
* plzip.texi (Memory requirements): Added table.
|
||||
* plzip.texi (Program design): Added a block diagram.
|
||||
* list.cc: Add missing '#include <pthread.h>'.
|
||||
* plzip.texi: New chapter 'Output'.
|
||||
* plzip.texi (Memory requirements): Add table.
|
||||
* plzip.texi (Program design): Add a block diagram.
|
||||
|
||||
2017-04-12 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
|
@ -41,15 +68,15 @@
|
|||
* Don't allow mixing different operations (-d, -l or -t).
|
||||
* main.cc: Continue testing if any input file is a terminal.
|
||||
* lzip_index.cc: Improve detection of bad dict and trailing data.
|
||||
* lzip.h: Unified messages for bad magic, trailing data, etc.
|
||||
* lzip.h: Unify messages for bad magic, trailing data, etc.
|
||||
|
||||
2016-05-14 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.5 released.
|
||||
* main.cc: Added new option '-a, --trailing-error'.
|
||||
* main.cc: New option '-a, --trailing-error'.
|
||||
* main.cc (main): Delete '--output' file if infd is a terminal.
|
||||
* main.cc (main): Don't use stdin more than once.
|
||||
* plzip.texi: Added chapters 'Trailing data' and 'Examples'.
|
||||
* plzip.texi: New chapters 'Trailing data' and 'Examples'.
|
||||
* configure: Avoid warning on some shells when testing for g++.
|
||||
* Makefile.in: Detect the existence of install-info.
|
||||
* check.sh: A POSIX shell is required to run the tests.
|
||||
|
@ -65,20 +92,20 @@
|
|||
* Version 1.3 released.
|
||||
* dec_stream.cc: Don't use output packets or muxer when testing.
|
||||
* Make '-dvvv' and '-tvvv' show dictionary size like lzip.
|
||||
* lzip.h: Added missing 'const' to the declaration of 'compress'.
|
||||
* plzip.texi: Added chapters 'Memory requirements' and
|
||||
* lzip.h: Add missing 'const' to the declaration of 'compress'.
|
||||
* plzip.texi: New chapters 'Memory requirements' and
|
||||
'Minimum file sizes'.
|
||||
* Makefile.in: Added new targets 'install*-compress'.
|
||||
* Makefile.in: New targets 'install*-compress'.
|
||||
|
||||
2014-08-29 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.2 released.
|
||||
* main.cc (close_and_set_permissions): Behave like 'cp -p'.
|
||||
* dec_stdout.cc dec_stream.cc: Make 'slot_av' a vector to limit
|
||||
* dec_stdout.cc, dec_stream.cc: Make 'slot_av' a vector to limit
|
||||
the number of packets produced by each worker individually.
|
||||
* plzip.texinfo: Renamed to plzip.texi.
|
||||
* plzip.texi: Documented the approximate amount of memory required.
|
||||
* License changed to GPL version 2 or later.
|
||||
* plzip.texinfo: Rename to plzip.texi.
|
||||
* plzip.texi: Document the approximate amount of memory required.
|
||||
* Change license to GPL version 2 or later.
|
||||
|
||||
2013-09-17 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
|
@ -89,14 +116,13 @@
|
|||
2013-05-29 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.0 released.
|
||||
* compress.cc: 'deliver_packet' changed to 'deliver_packets'.
|
||||
* compress.cc: Change 'deliver_packet' to 'deliver_packets'.
|
||||
* Scalability of decompression from/to regular files has been
|
||||
increased by removing splitter and muxer when not needed.
|
||||
* The number of worker threads is now limited to the number of
|
||||
members when decompressing from a regular file.
|
||||
* configure: Options now accept a separate argument.
|
||||
* Makefile.in: Added new target 'install-as-lzip'.
|
||||
* Makefile.in: Added new target 'install-bin'.
|
||||
* Makefile.in: New targets 'install-as-lzip' and 'install-bin'.
|
||||
* main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2.
|
||||
* main.cc: Define 'strtoull' to 'std::strtoul' on Windows.
|
||||
|
||||
|
@ -104,17 +130,17 @@
|
|||
|
||||
* Version 0.9 released.
|
||||
* Minor fixes and cleanups.
|
||||
* configure: 'datadir' renamed to 'datarootdir'.
|
||||
* configure: Rename 'datadir' to 'datarootdir'.
|
||||
|
||||
2012-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 0.8 released.
|
||||
* main.cc: Added new option '-F, --recompress'.
|
||||
* main.cc: New option '-F, --recompress'.
|
||||
* decompress.cc (decompress): Show compression ratio.
|
||||
* main.cc (close_and_set_permissions): Inability to change output
|
||||
file attributes has been downgraded from error to warning.
|
||||
* Small change in '--help' output and man page.
|
||||
* Changed quote characters in messages as advised by GNU Standards.
|
||||
* Change quote characters in messages as advised by GNU Standards.
|
||||
* main.cc: Set stdin/stdout in binary mode on OS2.
|
||||
* compress.cc: Reduce memory use of compressed packets.
|
||||
* decompress.cc: Use Boyer-Moore algorithm to search for headers.
|
||||
|
@ -128,15 +154,16 @@
|
|||
produced by workers to limit the amount of memory used.
|
||||
* main.cc (open_instream): Don't show the message
|
||||
" and '--stdout' was not specified" for directories, etc.
|
||||
* main.cc: Fixed warning about fchown return value being ignored.
|
||||
* testsuite: 'test1' renamed to 'test.txt'. Added new tests.
|
||||
Exit with status 1 if any output file exists and is skipped.
|
||||
* main.cc: Fix warning about fchown return value being ignored.
|
||||
* testsuite: Rename 'test1' to 'test.txt'. New tests.
|
||||
|
||||
2010-03-20 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 0.6 released.
|
||||
* Small portability fixes.
|
||||
* plzip.texinfo: Added chapter 'Program Design' and description
|
||||
of option '--threads'.
|
||||
* plzip.texinfo: New chapter 'Program Design'.
|
||||
Add missing description of option '-n, --threads'.
|
||||
* Debug stats have been fixed.
|
||||
|
||||
2010-02-10 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
@ -154,7 +181,7 @@
|
|||
2010-01-24 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 0.3 released.
|
||||
* Implemented option '--data-size'.
|
||||
* New option '-B, --data-size'.
|
||||
* Output file is now removed if plzip is interrupted.
|
||||
* This version automatically chooses the smallest possible
|
||||
dictionary size for each member during compression, saving
|
||||
|
@ -164,15 +191,14 @@
|
|||
2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 0.2 released.
|
||||
* Implemented option '--dictionary-size'.
|
||||
* Implemented option '--match-length'.
|
||||
* New options '-s, --dictionary-size' and '-m, --match-length'.
|
||||
* 'lacos_rbtree' has been replaced with a circular buffer.
|
||||
|
||||
2009-12-05 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 0.1 released.
|
||||
* This version is based on llzip-0.03 (2009-11-21), written by
|
||||
Laszlo Ersek <lacos@caesar.elte.hu>.
|
||||
Laszlo Ersek <lacos@caesar.elte.hu>. Thanks Laszlo!
|
||||
From llzip-0.03/README:
|
||||
|
||||
llzip is a hack on my lbzip2-0.17 release. I ripped out the
|
||||
|
@ -184,8 +210,8 @@
|
|||
until something better appears on the net.
|
||||
|
||||
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This file is a collection of facts, and thus it is not copyrightable,
|
||||
but just in case, you have unlimited permission to copy, distribute and
|
||||
but just in case, you have unlimited permission to copy, distribute, and
|
||||
modify it.
|
||||
|
|
37
INSTALL
37
INSTALL
|
@ -1,11 +1,15 @@
|
|||
Requirements
|
||||
------------
|
||||
You will need a C++ compiler and the lzlib compression library installed.
|
||||
I use gcc 5.3.0 and 4.1.2, but the code should compile with any standards
|
||||
You will need a C++11 compiler and the compression library lzlib installed.
|
||||
(gcc 3.3.6 or newer is recommended).
|
||||
I use gcc 6.1.0 and 4.1.2, but the code should compile with any standards
|
||||
compliant compiler.
|
||||
Lzlib must be version 1.0 or newer, but the fast encoder is only available
|
||||
in lzlib 1.7 or newer, and the HD = 3 detection of corrupt headers on
|
||||
non-seekable multimember files is only available in lzlib 1.10 or newer.
|
||||
|
||||
Lzlib must be version 1.0 or newer, but the fast encoder requires lzlib 1.7
|
||||
or newer, the Hamming distance (HD) = 3 detection of corrupt headers in
|
||||
non-seekable multimember files requires lzlib 1.10 or newer, and the 'no
|
||||
copy' optimization for testing requires lzlib 1.12 or newer.
|
||||
|
||||
Gcc is available at http://gcc.gnu.org.
|
||||
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
|
||||
|
||||
|
@ -33,7 +37,10 @@ the main archive.
|
|||
|
||||
To link against a lzlib not installed in a standard place, use:
|
||||
|
||||
./configure CPPFLAGS='-I<dir_of_lzlib.h>' LDFLAGS='-L<dir_of_liblz.a>'
|
||||
./configure CPPFLAGS='-I <includedir>' LDFLAGS='-L <libdir>'
|
||||
|
||||
(Replace <includedir> with the directory containing the file lzlib.h,
|
||||
and <libdir> with the directory containing the file liblz.a).
|
||||
|
||||
If you are compiling on MinGW, use --with-mingw (note that the Windows
|
||||
I/O functions used with MinGW are not guaranteed to be thread safe):
|
||||
|
@ -50,11 +57,11 @@ the main archive.
|
|||
documentation.
|
||||
|
||||
Or type 'make install-compress', which additionally compresses the
|
||||
info manual and the man page after installation. (Installing
|
||||
compressed docs may become the default in the future).
|
||||
info manual and the man page after installation.
|
||||
(Installing compressed docs may become the default in the future).
|
||||
|
||||
You can install only the program, the info manual or the man page by
|
||||
typing 'make install-bin', 'make install-info' or 'make install-man'
|
||||
You can install only the program, the info manual, or the man page by
|
||||
typing 'make install-bin', 'make install-info', or 'make install-man'
|
||||
respectively.
|
||||
|
||||
Instead of 'make install', you can type 'make install-as-lzip' to
|
||||
|
@ -65,10 +72,10 @@ the main archive.
|
|||
Another way
|
||||
-----------
|
||||
You can also compile plzip into a separate directory.
|
||||
To do this, you must use a version of 'make' that supports the 'VPATH'
|
||||
variable, such as GNU 'make'. 'cd' to the directory where you want the
|
||||
To do this, you must use a version of 'make' that supports the variable
|
||||
'VPATH', such as GNU 'make'. 'cd' to the directory where you want the
|
||||
object files and executables to go and run the 'configure' script.
|
||||
'configure' automatically checks for the source code in '.', in '..' and
|
||||
'configure' automatically checks for the source code in '.', in '..', and
|
||||
in the directory that 'configure' is in.
|
||||
|
||||
'configure' recognizes the option '--srcdir=DIR' to control where to
|
||||
|
@ -79,7 +86,7 @@ After running 'configure', you can run 'make' and 'make install' as
|
|||
explained above.
|
||||
|
||||
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This file is free documentation: you have unlimited permission to copy,
|
||||
distribute and modify it.
|
||||
distribute, and modify it.
|
||||
|
|
|
@ -79,7 +79,7 @@ install-info :
|
|||
-rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"*
|
||||
$(INSTALL_DATA) $(VPATH)/doc/$(pkgname).info "$(DESTDIR)$(infodir)/$(pkgname).info"
|
||||
-if $(CAN_RUN_INSTALLINFO) ; then \
|
||||
install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \
|
||||
install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \
|
||||
fi
|
||||
|
||||
install-info-compress : install-info
|
||||
|
@ -104,7 +104,7 @@ uninstall-bin :
|
|||
|
||||
uninstall-info :
|
||||
-if $(CAN_RUN_INSTALLINFO) ; then \
|
||||
install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \
|
||||
install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \
|
||||
fi
|
||||
-rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"*
|
||||
|
||||
|
@ -129,7 +129,9 @@ dist : doc
|
|||
$(DISTNAME)/*.cc \
|
||||
$(DISTNAME)/testsuite/check.sh \
|
||||
$(DISTNAME)/testsuite/test.txt \
|
||||
$(DISTNAME)/testsuite/test.txt.lz
|
||||
$(DISTNAME)/testsuite/fox_*.lz \
|
||||
$(DISTNAME)/testsuite/test.txt.lz \
|
||||
$(DISTNAME)/testsuite/test_em.txt.lz
|
||||
rm -f $(DISTNAME)
|
||||
lzip -v -9 $(DISTNAME).tar
|
||||
|
||||
|
|
71
NEWS
71
NEWS
|
@ -1,31 +1,58 @@
|
|||
Changes in version 1.8:
|
||||
Changes in version 1.9:
|
||||
|
||||
The new options '--in-slots' and '--out-slots', setting the number of input
|
||||
and output packets buffered during streamed decompression, have been added.
|
||||
Increasing the number of packets may increase decompression speed, but
|
||||
requires more memory.
|
||||
Plzip now reports an error if a file name is empty (plzip -t "").
|
||||
|
||||
The default number of input packets buffered per worker thread when
|
||||
decompressing from non-seekable input has been increased from 2 to 4.
|
||||
Option '-o, --output' now behaves like '-c, --stdout', but sending the
|
||||
output unconditionally to a file instead of to standard output. See the new
|
||||
description of '-o' in the manual. This change is backwards compatible only
|
||||
when (de)compressing from standard input alone. Therefore commands like:
|
||||
plzip -o foo.lz - bar < foo
|
||||
must now be split into:
|
||||
plzip -o foo.lz - < foo
|
||||
plzip bar
|
||||
or rewritten as:
|
||||
plzip - bar < foo > foo.lz
|
||||
|
||||
The default number of output packets buffered per worker thread when
|
||||
decompressing to non-seekable output has been increased from 32 to 64.
|
||||
When using '-c' or '-o', plzip now checks whether the output is a terminal
|
||||
only once.
|
||||
|
||||
Detection of forbidden combinations of characters in trailing data has been
|
||||
improved.
|
||||
Plzip now does not even open the output file if the input file is a terminal.
|
||||
|
||||
Errors are now also checked when closing the input file.
|
||||
The new option '--check-lib', which compares the version of lzlib used to
|
||||
compile plzip with the version actually being used at run time, has been added.
|
||||
|
||||
The descriptions of '-0..-9', '-m' and '-s' in the manual have been
|
||||
improved.
|
||||
The words 'decompressed' and 'compressed' have been replaced with the
|
||||
shorter 'out' and 'in' in the verbose output when decompressing or testing.
|
||||
|
||||
The configure script now accepts the option '--with-mingw' to enable the
|
||||
compilation of plzip under MS Windows (with the MinGW compiler). Use with
|
||||
care. The Windows I/O functions used are not guaranteed to be thread safe.
|
||||
(Code based on a patch by Hannes Domani).
|
||||
When checking the integrity of multiple files, plzip is now able to continue
|
||||
checking the rest of the files (instead of exiting) if some of them fail the
|
||||
test, allowing 'plzip --test' to show a final diagnostic with the number of
|
||||
files that failed (just as 'lzip --test').
|
||||
|
||||
The configure script now accepts appending options to CXXFLAGS using the
|
||||
syntax 'CXXFLAGS+=OPTIONS'.
|
||||
Testing is now slightly (1.6%) faster when using lzlib 1.12.
|
||||
|
||||
It has been documented in INSTALL the use of
|
||||
CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO' when compiling on MinGW.
|
||||
When compressing, or when decompressing or testing from a non-seekable file
|
||||
or from standard input, plzip now starts only the number of worker threads
|
||||
required.
|
||||
|
||||
When decompressing or testing from a non-seekable file or from standard
|
||||
input, trailing data are now not counted in the compressed size shown.
|
||||
|
||||
When decompressing or testing a multimember file, plzip now shows the
|
||||
largest dictionary size of all members in the file instead of showing the
|
||||
dictionary size of the first member.
|
||||
|
||||
Option '--list' now reports corruption or truncation of the last header in a
|
||||
multimenber file specifically instead of showing the generic message "Last
|
||||
member in input file is truncated or corrupt."
|
||||
|
||||
The error messages for 'Data error' and 'Unexpected EOF' have been shortened.
|
||||
|
||||
The commands needed to extract files from a tar.lz archive have been
|
||||
documented in the manual, in the output of '--help', and in the man page.
|
||||
|
||||
Tarlz is mentioned in the manual as an alternative to tar + plzip.
|
||||
|
||||
Several fixes and improvements have been made to the manual.
|
||||
|
||||
8 new test files have been added to the testsuite.
|
||||
|
|
98
README
98
README
|
@ -1,30 +1,36 @@
|
|||
Description
|
||||
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip, fully
|
||||
compatible with lzip 1.4 or newer. Plzip uses the lzlib compression library.
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0)
|
||||
or compress most files more than bzip2 (lzip -9). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2
|
||||
from a data recovery perspective. Lzip has been designed, written and
|
||||
tested with great care to replace gzip and bzip2 as the standard
|
||||
general-purpose compressed format for unix-like systems.
|
||||
Lzip is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip (lzip -0) or
|
||||
compress most files more than bzip2 (lzip -9). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general-purpose
|
||||
compressed format for unix-like systems.
|
||||
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip.
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
to 2 percent larger compressed files). Note that the number of usable
|
||||
threads is limited by file size; on files larger than a few GB plzip can use
|
||||
hundreds of processors, but on files of only a few MB plzip is no faster
|
||||
than lzip.
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
compresses as many chunks simultaneously as worker threads are chosen,
|
||||
creating a multimember compressed file.
|
||||
For creation and manipulation of compressed tar archives tarlz can be more
|
||||
efficient than using tar and plzip because tarlz is able to keep the
|
||||
alignment between tar members and lzip members.
|
||||
|
||||
When compressing, plzip divides the input file into chunks and compresses as
|
||||
many chunks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed file.
|
||||
|
||||
When decompressing, plzip decompresses as many members simultaneously as
|
||||
worker threads are chosen. Files that were compressed with lzip will not
|
||||
be decompressed faster than using lzip (unless the '-b' option was used)
|
||||
be decompressed faster than using lzip (unless the option '-b' was used)
|
||||
because lzip usually produces single-member files, which can't be
|
||||
decompressed in parallel.
|
||||
|
||||
|
@ -32,34 +38,34 @@ The lzip file format is designed for data sharing and long-term archiving,
|
|||
taking into account both data integrity and decoder availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file.
|
||||
recovery means. The program lziprecover can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files, and
|
||||
provides data recovery capabilities, including error-checked merging
|
||||
of damaged copies of a file.
|
||||
|
||||
* The lzip format is as simple as possible (but not simpler). The
|
||||
lzip manual provides the source code of a simple decompressor
|
||||
along with a detailed explanation of how it works, so that with
|
||||
the only help of the lzip manual it would be possible for a
|
||||
digital archaeologist to extract the data from a lzip file long
|
||||
after quantum computers eventually render LZMA obsolete.
|
||||
* The lzip format is as simple as possible (but not simpler). The lzip
|
||||
manual provides the source code of a simple decompressor along with a
|
||||
detailed explanation of how it works, so that with the only help of the
|
||||
lzip manual it would be possible for a digital archaeologist to extract
|
||||
the data from a lzip file long after quantum computers eventually
|
||||
render LZMA obsolete.
|
||||
|
||||
* Additionally the lzip reference implementation is copylefted, which
|
||||
guarantees that it will remain free forever.
|
||||
|
||||
A nice feature of the lzip format is that a corrupt byte is easier to
|
||||
repair the nearer it is from the beginning of the file. Therefore, with
|
||||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
A nice feature of the lzip format is that a corrupt byte is easier to repair
|
||||
the nearer it is from the beginning of the file. Therefore, with the help of
|
||||
lziprecover, losing an entire archive just because of a corrupt byte near
|
||||
the beginning is a thing of the past.
|
||||
|
||||
Plzip uses the same well-defined exit status values used by lzip, which
|
||||
makes it safer than compressors returning ambiguous warning values (like
|
||||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Plzip will automatically use for each file the largest dictionary size
|
||||
that does not exceed neither the file size nor the limit given. Keep in
|
||||
mind that the decompression memory requirement is affected at
|
||||
compression time by the choice of dictionary size limit.
|
||||
Plzip will automatically use for each file the largest dictionary size that
|
||||
does not exceed neither the file size nor the limit given. Keep in mind that
|
||||
the decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
|
@ -76,28 +82,28 @@ possible, ownership of the file just as 'cp -p' does. (If the user ID or
|
|||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
|
||||
Plzip is able to read from some types of non regular files if the
|
||||
'--stdout' option is specified.
|
||||
Plzip is able to read from some types of non-regular files if either the
|
||||
option '-c' or the option '-o' is specified.
|
||||
|
||||
If no file names are specified, plzip compresses (or decompresses) from
|
||||
standard input to standard output. In this case, plzip will decline to
|
||||
write compressed output to a terminal, as this would be entirely
|
||||
incomprehensible and therefore pointless.
|
||||
standard input to standard output. Plzip will refuse to read compressed data
|
||||
from a terminal or write compressed data to a terminal, as this would be
|
||||
entirely incomprehensible and might leave the terminal in an abnormal state.
|
||||
|
||||
Plzip will correctly decompress a file which is the concatenation of two or
|
||||
more compressed files. The result is the concatenation of the corresponding
|
||||
decompressed files. Integrity testing of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
|
||||
have been compressed. Decompressed is used to refer to data which have
|
||||
undergone the process of decompression.
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
|
||||
been compressed. Decompressed is used to refer to data which have undergone
|
||||
the process of decompression.
|
||||
|
||||
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This file is free documentation: you have unlimited permission to copy,
|
||||
distribute and modify it.
|
||||
distribute, and modify it.
|
||||
|
||||
The file Makefile.in is a data file used by configure to produce the
|
||||
Makefile. It has the same copyright owner and permissions that configure
|
||||
|
|
|
@ -1,20 +1,20 @@
|
|||
/* Arg_parser - POSIX/GNU command line argument parser. (C++ version)
|
||||
Copyright (C) 2006-2019 Antonio Diaz Diaz.
|
||||
/* Arg_parser - POSIX/GNU command line argument parser. (C++ version)
|
||||
Copyright (C) 2006-2021 Antonio Diaz Diaz.
|
||||
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
that the following conditions are met:
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions, and the following disclaimer.
|
||||
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions, and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
This library is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
This library is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
*/
|
||||
|
||||
#include <cstring>
|
||||
|
@ -167,7 +167,7 @@ Arg_parser::Arg_parser( const int argc, const char * const argv[],
|
|||
else non_options.push_back( argv[argind++] );
|
||||
}
|
||||
}
|
||||
if( error_.size() ) data.clear();
|
||||
if( !error_.empty() ) data.clear();
|
||||
else
|
||||
{
|
||||
for( unsigned i = 0; i < non_options.size(); ++i )
|
||||
|
@ -190,7 +190,7 @@ Arg_parser::Arg_parser( const char * const opt, const char * const arg,
|
|||
{ if( opt[2] ) parse_long_option( opt, arg, options, argind ); }
|
||||
else
|
||||
parse_short_option( opt, arg, options, argind );
|
||||
if( error_.size() ) data.clear();
|
||||
if( !error_.empty() ) data.clear();
|
||||
}
|
||||
else data.push_back( Record( opt ) );
|
||||
}
|
||||
|
|
69
arg_parser.h
69
arg_parser.h
|
@ -1,43 +1,43 @@
|
|||
/* Arg_parser - POSIX/GNU command line argument parser. (C++ version)
|
||||
Copyright (C) 2006-2019 Antonio Diaz Diaz.
|
||||
/* Arg_parser - POSIX/GNU command line argument parser. (C++ version)
|
||||
Copyright (C) 2006-2021 Antonio Diaz Diaz.
|
||||
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
that the following conditions are met:
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions, and the following disclaimer.
|
||||
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
2. Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions, and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
This library is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
This library is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
*/
|
||||
|
||||
/* Arg_parser reads the arguments in 'argv' and creates a number of
|
||||
option codes, option arguments and non-option arguments.
|
||||
/* Arg_parser reads the arguments in 'argv' and creates a number of
|
||||
option codes, option arguments, and non-option arguments.
|
||||
|
||||
In case of error, 'error' returns a non-empty error message.
|
||||
In case of error, 'error' returns a non-empty error message.
|
||||
|
||||
'options' is an array of 'struct Option' terminated by an element
|
||||
containing a code which is zero. A null name means a short-only
|
||||
option. A code value outside the unsigned char range means a
|
||||
long-only option.
|
||||
'options' is an array of 'struct Option' terminated by an element
|
||||
containing a code which is zero. A null name means a short-only
|
||||
option. A code value outside the unsigned char range means a
|
||||
long-only option.
|
||||
|
||||
Arg_parser normally makes it appear as if all the option arguments
|
||||
were specified before all the non-option arguments for the purposes
|
||||
of parsing, even if the user of your program intermixed option and
|
||||
non-option arguments. If you want the arguments in the exact order
|
||||
the user typed them, call 'Arg_parser' with 'in_order' = true.
|
||||
Arg_parser normally makes it appear as if all the option arguments
|
||||
were specified before all the non-option arguments for the purposes
|
||||
of parsing, even if the user of your program intermixed option and
|
||||
non-option arguments. If you want the arguments in the exact order
|
||||
the user typed them, call 'Arg_parser' with 'in_order' = true.
|
||||
|
||||
The argument '--' terminates all options; any following arguments are
|
||||
treated as non-option arguments, even if they begin with a hyphen.
|
||||
The argument '--' terminates all options; any following arguments are
|
||||
treated as non-option arguments, even if they begin with a hyphen.
|
||||
|
||||
The syntax for optional option arguments is '-<short_option><argument>'
|
||||
(without whitespace), or '--<long_option>=<argument>'.
|
||||
The syntax for optional option arguments is '-<short_option><argument>'
|
||||
(without whitespace), or '--<long_option>=<argument>'.
|
||||
*/
|
||||
|
||||
class Arg_parser
|
||||
|
@ -61,6 +61,7 @@ private:
|
|||
explicit Record( const char * const arg ) : code( 0 ), argument( arg ) {}
|
||||
};
|
||||
|
||||
const std::string empty_arg;
|
||||
std::string error_;
|
||||
std::vector< Record > data;
|
||||
|
||||
|
@ -73,17 +74,17 @@ public:
|
|||
Arg_parser( const int argc, const char * const argv[],
|
||||
const Option options[], const bool in_order = false );
|
||||
|
||||
// Restricted constructor. Parses a single token and argument (if any)
|
||||
// Restricted constructor. Parses a single token and argument (if any).
|
||||
Arg_parser( const char * const opt, const char * const arg,
|
||||
const Option options[] );
|
||||
|
||||
const std::string & error() const { return error_; }
|
||||
|
||||
// The number of arguments parsed (may be different from argc)
|
||||
// The number of arguments parsed. May be different from argc.
|
||||
int arguments() const { return data.size(); }
|
||||
|
||||
// If code( i ) is 0, argument( i ) is a non-option.
|
||||
// Else argument( i ) is the option's argument (or empty).
|
||||
/* If code( i ) is 0, argument( i ) is a non-option.
|
||||
Else argument( i ) is the option's argument (or empty). */
|
||||
int code( const int i ) const
|
||||
{
|
||||
if( i >= 0 && i < arguments() ) return data[i].code;
|
||||
|
@ -93,6 +94,6 @@ public:
|
|||
const std::string & argument( const int i ) const
|
||||
{
|
||||
if( i >= 0 && i < arguments() ) return data[i].argument;
|
||||
else return error_;
|
||||
else return empty_arg;
|
||||
}
|
||||
};
|
||||
|
|
231
compress.cc
231
compress.cc
|
@ -1,19 +1,19 @@
|
|||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
@ -27,7 +27,6 @@
|
|||
#include <cstring>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <pthread.h>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#include <lzlib.h>
|
||||
|
@ -39,9 +38,9 @@
|
|||
#endif
|
||||
|
||||
|
||||
// Returns the number of bytes really read.
|
||||
// If (returned value < size) and (errno == 0), means EOF was reached.
|
||||
//
|
||||
/* Returns the number of bytes really read.
|
||||
If (returned value < size) and (errno == 0), means EOF was reached.
|
||||
*/
|
||||
int readblock( const int fd, uint8_t * const buf, const int size )
|
||||
{
|
||||
int sz = 0;
|
||||
|
@ -58,9 +57,9 @@ int readblock( const int fd, uint8_t * const buf, const int size )
|
|||
}
|
||||
|
||||
|
||||
// Returns the number of bytes really written.
|
||||
// If (returned value < size), it is always an error.
|
||||
//
|
||||
/* Returns the number of bytes really written.
|
||||
If (returned value < size), it is always an error.
|
||||
*/
|
||||
int writeblock( const int fd, const uint8_t * const buf, const int size )
|
||||
{
|
||||
int sz = 0;
|
||||
|
@ -150,7 +149,7 @@ namespace {
|
|||
|
||||
unsigned long long in_size = 0;
|
||||
unsigned long long out_size = 0;
|
||||
const char * const mem_msg = "Not enough memory. Try a smaller dictionary size";
|
||||
const char * const mem_msg2 = "Not enough memory. Try a smaller dictionary size.";
|
||||
|
||||
|
||||
struct Packet // data block with a serial number
|
||||
|
@ -235,8 +234,7 @@ public:
|
|||
xunlock( &imutex );
|
||||
if( !ipacket ) // EOF
|
||||
{
|
||||
// notify muxer when last worker exits
|
||||
xlock( &omutex );
|
||||
xlock( &omutex ); // notify muxer when last worker exits
|
||||
if( --num_working == 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
}
|
||||
|
@ -284,12 +282,16 @@ public:
|
|||
void return_empty_packet() // return a slot to the tally
|
||||
{ slot_tally.leave_slot(); }
|
||||
|
||||
void finish() // splitter has no more packets to send
|
||||
void finish( const int workers_spared )
|
||||
{
|
||||
xlock( &imutex );
|
||||
xlock( &imutex ); // splitter has no more packets to send
|
||||
eof = true;
|
||||
xbroadcast( &iav_or_eof );
|
||||
xunlock( &imutex );
|
||||
xlock( &omutex ); // notify muxer if all workers have exited
|
||||
num_working -= workers_spared;
|
||||
if( num_working <= 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
bool finished() // all packets delivered to muxer
|
||||
|
@ -303,52 +305,6 @@ public:
|
|||
};
|
||||
|
||||
|
||||
struct Splitter_arg
|
||||
{
|
||||
Packet_courier * courier;
|
||||
const Pretty_print * pp;
|
||||
int infd;
|
||||
int data_size;
|
||||
int offset;
|
||||
};
|
||||
|
||||
|
||||
// split data from input file into chunks and pass them to
|
||||
// courier for packaging and distribution to workers.
|
||||
extern "C" void * csplitter( void * arg )
|
||||
{
|
||||
const Splitter_arg & tmp = *(const Splitter_arg *)arg;
|
||||
Packet_courier & courier = *tmp.courier;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
const int infd = tmp.infd;
|
||||
const int data_size = tmp.data_size;
|
||||
const int offset = tmp.offset;
|
||||
|
||||
for( bool first_post = true; ; first_post = false )
|
||||
{
|
||||
uint8_t * const data = new( std::nothrow ) uint8_t[offset+data_size];
|
||||
if( !data ) { pp( mem_msg ); cleanup_and_fail(); }
|
||||
const int size = readblock( infd, data + offset, data_size );
|
||||
if( size != data_size && errno )
|
||||
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
|
||||
|
||||
if( size > 0 || first_post ) // first packet may be empty
|
||||
{
|
||||
in_size += size;
|
||||
courier.receive_packet( data, size );
|
||||
if( size < data_size ) break; // EOF
|
||||
}
|
||||
else
|
||||
{
|
||||
delete[] data;
|
||||
break;
|
||||
}
|
||||
}
|
||||
courier.finish(); // no more packets to send
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
struct Worker_arg
|
||||
{
|
||||
Packet_courier * courier;
|
||||
|
@ -358,9 +314,18 @@ struct Worker_arg
|
|||
int offset;
|
||||
};
|
||||
|
||||
struct Splitter_arg
|
||||
{
|
||||
struct Worker_arg worker_arg;
|
||||
pthread_t * worker_threads;
|
||||
int infd;
|
||||
int data_size;
|
||||
int num_workers; // returned by splitter to main thread
|
||||
};
|
||||
|
||||
// get packets from courier, replace their contents, and return
|
||||
// them to courier.
|
||||
|
||||
/* Get packets from courier, replace their contents, and return them to
|
||||
courier. */
|
||||
extern "C" void * cworker( void * arg )
|
||||
{
|
||||
const Worker_arg & tmp = *(const Worker_arg *)arg;
|
||||
|
@ -386,7 +351,7 @@ extern "C" void * cworker( void * arg )
|
|||
if( !encoder || LZ_compress_errno( encoder ) != LZ_ok )
|
||||
{
|
||||
if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error )
|
||||
pp( mem_msg );
|
||||
pp( mem_msg2 );
|
||||
else
|
||||
internal_error( "invalid argument to encoder." );
|
||||
cleanup_and_fail();
|
||||
|
@ -435,8 +400,57 @@ extern "C" void * cworker( void * arg )
|
|||
}
|
||||
|
||||
|
||||
// get from courier the processed and sorted packets, and write
|
||||
// their contents to the output file.
|
||||
/* Split data from input file into chunks and pass them to courier for
|
||||
packaging and distribution to workers.
|
||||
Start a worker per packet up to a maximum of num_workers.
|
||||
*/
|
||||
extern "C" void * csplitter( void * arg )
|
||||
{
|
||||
Splitter_arg & tmp = *(Splitter_arg *)arg;
|
||||
Packet_courier & courier = *tmp.worker_arg.courier;
|
||||
const Pretty_print & pp = *tmp.worker_arg.pp;
|
||||
pthread_t * const worker_threads = tmp.worker_threads;
|
||||
const int offset = tmp.worker_arg.offset;
|
||||
const int infd = tmp.infd;
|
||||
const int data_size = tmp.data_size;
|
||||
int i = 0; // number of workers started
|
||||
|
||||
for( bool first_post = true; ; first_post = false )
|
||||
{
|
||||
uint8_t * const data = new( std::nothrow ) uint8_t[offset+data_size];
|
||||
if( !data ) { pp( mem_msg2 ); cleanup_and_fail(); }
|
||||
const int size = readblock( infd, data + offset, data_size );
|
||||
if( size != data_size && errno )
|
||||
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
|
||||
|
||||
if( size > 0 || first_post ) // first packet may be empty
|
||||
{
|
||||
in_size += size;
|
||||
courier.receive_packet( data, size );
|
||||
if( i < tmp.num_workers ) // start a new worker
|
||||
{
|
||||
const int errcode =
|
||||
pthread_create( &worker_threads[i++], 0, cworker, &tmp.worker_arg );
|
||||
if( errcode ) { show_error( "Can't create worker threads", errcode );
|
||||
cleanup_and_fail(); }
|
||||
}
|
||||
if( size < data_size ) break; // EOF
|
||||
}
|
||||
else
|
||||
{
|
||||
delete[] data;
|
||||
break;
|
||||
}
|
||||
}
|
||||
courier.finish( tmp.num_workers - i ); // no more packets to send
|
||||
tmp.num_workers = i;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/* Get from courier the processed and sorted packets, and write their
|
||||
contents to the output file.
|
||||
*/
|
||||
void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
||||
{
|
||||
std::vector< const Packet * > packet_vector;
|
||||
|
@ -450,8 +464,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
|||
const Packet * const opacket = packet_vector[i];
|
||||
out_size += opacket->size;
|
||||
|
||||
const int wr = writeblock( outfd, opacket->data, opacket->size );
|
||||
if( wr != opacket->size )
|
||||
if( writeblock( outfd, opacket->data, opacket->size ) != opacket->size )
|
||||
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); }
|
||||
delete[] opacket->data;
|
||||
courier.return_empty_packet();
|
||||
|
@ -462,8 +475,8 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
|||
} // end namespace
|
||||
|
||||
|
||||
// init the courier, then start the splitter and the workers and
|
||||
// call the muxer.
|
||||
/* Init the courier, then start the splitter and the workers and call the
|
||||
muxer. */
|
||||
int compress( const unsigned long long cfile_size,
|
||||
const int data_size, const int dictionary_size,
|
||||
const int match_len_limit, const int num_workers,
|
||||
|
@ -478,50 +491,44 @@ int compress( const unsigned long long cfile_size,
|
|||
out_size = 0;
|
||||
Packet_courier courier( num_workers, num_slots );
|
||||
|
||||
if( debug_level & 2 ) std::fputs( "compress.\n", stderr );
|
||||
|
||||
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
|
||||
if( !worker_threads ) { pp( mem_msg ); return 1; }
|
||||
|
||||
Splitter_arg splitter_arg;
|
||||
splitter_arg.courier = &courier;
|
||||
splitter_arg.pp = &pp;
|
||||
splitter_arg.worker_arg.courier = &courier;
|
||||
splitter_arg.worker_arg.pp = &pp;
|
||||
splitter_arg.worker_arg.dictionary_size = dictionary_size;
|
||||
splitter_arg.worker_arg.match_len_limit = match_len_limit;
|
||||
splitter_arg.worker_arg.offset = offset;
|
||||
splitter_arg.worker_threads = worker_threads;
|
||||
splitter_arg.infd = infd;
|
||||
splitter_arg.data_size = data_size;
|
||||
splitter_arg.offset = offset;
|
||||
splitter_arg.num_workers = num_workers;
|
||||
|
||||
pthread_t splitter_thread;
|
||||
int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create splitter thread", errcode ); cleanup_and_fail(); }
|
||||
{ show_error( "Can't create splitter thread", errcode );
|
||||
delete[] worker_threads; return 1; }
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, cfile_size, &pp ); // init
|
||||
|
||||
Worker_arg worker_arg;
|
||||
worker_arg.courier = &courier;
|
||||
worker_arg.pp = &pp;
|
||||
worker_arg.dictionary_size = dictionary_size;
|
||||
worker_arg.match_len_limit = match_len_limit;
|
||||
worker_arg.offset = offset;
|
||||
|
||||
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
|
||||
if( !worker_threads ) { pp( mem_msg ); cleanup_and_fail(); }
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
{
|
||||
errcode = pthread_create( worker_threads + i, 0, cworker, &worker_arg );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); }
|
||||
}
|
||||
|
||||
muxer( courier, pp, outfd );
|
||||
|
||||
for( int i = num_workers - 1; i >= 0; --i )
|
||||
{
|
||||
errcode = pthread_join( splitter_thread, 0 );
|
||||
if( errcode ) { show_error( "Can't join splitter thread", errcode );
|
||||
cleanup_and_fail(); }
|
||||
|
||||
for( int i = splitter_arg.num_workers; --i >= 0; )
|
||||
{ // join only the workers started
|
||||
errcode = pthread_join( worker_threads[i], 0 );
|
||||
if( errcode )
|
||||
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); }
|
||||
if( errcode ) { show_error( "Can't join worker threads", errcode );
|
||||
cleanup_and_fail(); }
|
||||
}
|
||||
delete[] worker_threads;
|
||||
|
||||
errcode = pthread_join( splitter_thread, 0 );
|
||||
if( errcode )
|
||||
{ show_error( "Can't join splitter thread", errcode ); cleanup_and_fail(); }
|
||||
|
||||
if( verbosity >= 1 )
|
||||
{
|
||||
if( in_size == 0 || out_size == 0 )
|
||||
|
@ -537,14 +544,14 @@ int compress( const unsigned long long cfile_size,
|
|||
|
||||
if( debug_level & 1 )
|
||||
std::fprintf( stderr,
|
||||
"workers started %8u\n"
|
||||
"any worker tried to consume from splitter %8u times\n"
|
||||
"any worker had to wait %8u times\n"
|
||||
"muxer tried to consume from workers %8u times\n"
|
||||
"muxer had to wait %8u times\n",
|
||||
courier.icheck_counter,
|
||||
courier.iwait_counter,
|
||||
courier.ocheck_counter,
|
||||
courier.owait_counter );
|
||||
splitter_arg.num_workers,
|
||||
courier.icheck_counter, courier.iwait_counter,
|
||||
courier.ocheck_counter, courier.owait_counter );
|
||||
|
||||
if( !courier.finished() ) internal_error( "courier not finished." );
|
||||
return 0;
|
||||
|
|
27
configure
vendored
27
configure
vendored
|
@ -1,12 +1,12 @@
|
|||
#! /bin/sh
|
||||
# configure script for Plzip - Massively parallel implementation of lzip
|
||||
# Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
#
|
||||
# This configure script is free software: you have unlimited permission
|
||||
# to copy, distribute and modify it.
|
||||
# to copy, distribute, and modify it.
|
||||
|
||||
pkgname=plzip
|
||||
pkgversion=1.8
|
||||
pkgversion=1.9
|
||||
progname=plzip
|
||||
with_mingw=
|
||||
srctrigger=doc/${pkgname}.texi
|
||||
|
@ -27,11 +27,7 @@ CXXFLAGS='-Wall -W -O2'
|
|||
LDFLAGS=
|
||||
|
||||
# checking whether we are using GNU C++.
|
||||
/bin/sh -c "${CXX} --version" > /dev/null 2>&1 ||
|
||||
{
|
||||
CXX=c++
|
||||
CXXFLAGS=-O2
|
||||
}
|
||||
/bin/sh -c "${CXX} --version" > /dev/null 2>&1 || { CXX=c++ ; CXXFLAGS=-O2 ; }
|
||||
|
||||
# Loop over all args
|
||||
args=
|
||||
|
@ -43,11 +39,12 @@ while [ $# != 0 ] ; do
|
|||
shift
|
||||
|
||||
# Add the argument quoted to args
|
||||
args="${args} \"${option}\""
|
||||
if [ -z "${args}" ] ; then args="\"${option}\""
|
||||
else args="${args} \"${option}\"" ; fi
|
||||
|
||||
# Split out the argument for options that take them
|
||||
case ${option} in
|
||||
*=*) optarg=`echo ${option} | sed -e 's,^[^=]*=,,;s,/$,,'` ;;
|
||||
*=*) optarg=`echo "${option}" | sed -e 's,^[^=]*=,,;s,/$,,'` ;;
|
||||
esac
|
||||
|
||||
# Process the options
|
||||
|
@ -128,7 +125,7 @@ if [ -z "${srcdir}" ] ; then
|
|||
if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi
|
||||
if [ ! -r "${srcdir}/${srctrigger}" ] ; then
|
||||
## the sed command below emulates the dirname command
|
||||
srcdir=`echo $0 | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
|
||||
srcdir=`echo "$0" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
|
||||
fi
|
||||
fi
|
||||
|
||||
|
@ -151,7 +148,7 @@ if [ -z "${no_create}" ] ; then
|
|||
# Run this file to recreate the current configuration.
|
||||
#
|
||||
# This script is free software: you have unlimited permission
|
||||
# to copy, distribute and modify it.
|
||||
# to copy, distribute, and modify it.
|
||||
|
||||
exec /bin/sh $0 ${args} --no-create
|
||||
EOF
|
||||
|
@ -174,11 +171,11 @@ echo "LDFLAGS = ${LDFLAGS}"
|
|||
rm -f Makefile
|
||||
cat > Makefile << EOF
|
||||
# Makefile for Plzip - Massively parallel implementation of lzip
|
||||
# Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
# This file was generated automatically by configure. Don't edit.
|
||||
#
|
||||
# This Makefile is free software: you have unlimited permission
|
||||
# to copy, distribute and modify it.
|
||||
# to copy, distribute, and modify it.
|
||||
|
||||
pkgname = ${pkgname}
|
||||
pkgversion = ${pkgversion}
|
||||
|
@ -199,5 +196,5 @@ EOF
|
|||
cat "${srcdir}/Makefile.in" >> Makefile
|
||||
|
||||
echo "OK. Now you can run make."
|
||||
echo "If make fails, verify that the lzlib compression library is correctly"
|
||||
echo "If make fails, verify that the compression library lzlib is correctly"
|
||||
echo "installed (see INSTALL)."
|
||||
|
|
205
dec_stdout.cc
205
dec_stdout.cc
|
@ -1,19 +1,19 @@
|
|||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
@ -28,7 +28,6 @@
|
|||
#include <queue>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <pthread.h>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#include <lzlib.h>
|
||||
|
@ -44,10 +43,13 @@ enum { max_packet_size = 1 << 20 };
|
|||
|
||||
struct Packet // data block
|
||||
{
|
||||
uint8_t * data; // data == 0 means end of member
|
||||
uint8_t * data; // data may be null if size == 0
|
||||
int size; // number of bytes in data (if any)
|
||||
explicit Packet( uint8_t * const d = 0, const int s = 0 )
|
||||
: data( d ), size( s ) {}
|
||||
bool eom; // end of member
|
||||
Packet() : data( 0 ), size( 0 ), eom( true ) {}
|
||||
Packet( uint8_t * const d, const int s, const bool e )
|
||||
: data( d ), size( s ), eom ( e ) {}
|
||||
~Packet() { if( data ) delete[] data; }
|
||||
};
|
||||
|
||||
|
||||
|
@ -58,23 +60,25 @@ public:
|
|||
unsigned owait_counter;
|
||||
private:
|
||||
int deliver_worker_id; // worker queue currently delivering packets
|
||||
std::vector< std::queue< Packet * > > opacket_queues;
|
||||
std::vector< std::queue< const Packet * > > opacket_queues;
|
||||
int num_working; // number of workers still running
|
||||
const int num_workers; // number of workers
|
||||
const unsigned out_slots; // max output packets per queue
|
||||
pthread_mutex_t omutex;
|
||||
pthread_cond_t oav_or_exit; // output packet available or all workers exited
|
||||
std::vector< pthread_cond_t > slot_av; // output slot available
|
||||
const Shared_retval & shared_retval; // discard new packets on error
|
||||
|
||||
Packet_courier( const Packet_courier & ); // declared as private
|
||||
void operator=( const Packet_courier & ); // declared as private
|
||||
|
||||
public:
|
||||
Packet_courier( const int workers, const int slots )
|
||||
: ocheck_counter( 0 ), owait_counter( 0 ),
|
||||
deliver_worker_id( 0 ),
|
||||
Packet_courier( const Shared_retval & sh_ret, const int workers,
|
||||
const int slots )
|
||||
: ocheck_counter( 0 ), owait_counter( 0 ), deliver_worker_id( 0 ),
|
||||
opacket_queues( workers ), num_working( workers ),
|
||||
num_workers( workers ), out_slots( slots ), slot_av( workers )
|
||||
num_workers( workers ), out_slots( slots ), slot_av( workers ),
|
||||
shared_retval( sh_ret )
|
||||
{
|
||||
xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
|
||||
for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] );
|
||||
|
@ -82,6 +86,10 @@ public:
|
|||
|
||||
~Packet_courier()
|
||||
{
|
||||
if( shared_retval() ) // cleanup to avoid memory leaks
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
while( !opacket_queues[i].empty() )
|
||||
{ delete opacket_queues[i].front(); opacket_queues[i].pop(); }
|
||||
for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
|
||||
xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
|
||||
}
|
||||
|
@ -94,25 +102,28 @@ public:
|
|||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
// collect a packet from a worker
|
||||
void collect_packet( Packet * const opacket, const int worker_id )
|
||||
// collect a packet from a worker, discard packet on error
|
||||
void collect_packet( const Packet * const opacket, const int worker_id )
|
||||
{
|
||||
xlock( &omutex );
|
||||
if( opacket->data )
|
||||
{
|
||||
while( opacket_queues[worker_id].size() >= out_slots )
|
||||
{
|
||||
if( shared_retval() ) { delete opacket; goto done; }
|
||||
xwait( &slot_av[worker_id], &omutex );
|
||||
}
|
||||
}
|
||||
opacket_queues[worker_id].push( opacket );
|
||||
if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
|
||||
done:
|
||||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
// deliver a packet to muxer
|
||||
// if packet data == 0, move to next queue and wait again
|
||||
Packet * deliver_packet()
|
||||
/* deliver a packet to muxer
|
||||
if packet->eom, move to next queue
|
||||
if packet data == 0, wait again */
|
||||
const Packet * deliver_packet()
|
||||
{
|
||||
Packet * opacket = 0;
|
||||
const Packet * opacket = 0;
|
||||
xlock( &omutex );
|
||||
++ocheck_counter;
|
||||
while( true )
|
||||
|
@ -127,8 +138,9 @@ public:
|
|||
opacket_queues[deliver_worker_id].pop();
|
||||
if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
|
||||
xsignal( &slot_av[deliver_worker_id] );
|
||||
if( opacket->eom && ++deliver_worker_id >= num_workers )
|
||||
deliver_worker_id = 0;
|
||||
if( opacket->data ) break;
|
||||
if( ++deliver_worker_id >= num_workers ) deliver_worker_id = 0;
|
||||
delete opacket; opacket = 0;
|
||||
}
|
||||
xunlock( &omutex );
|
||||
|
@ -150,32 +162,34 @@ struct Worker_arg
|
|||
const Lzip_index * lzip_index;
|
||||
Packet_courier * courier;
|
||||
const Pretty_print * pp;
|
||||
Shared_retval * shared_retval;
|
||||
int worker_id;
|
||||
int num_workers;
|
||||
int infd;
|
||||
};
|
||||
|
||||
|
||||
// read members from file, decompress their contents, and
|
||||
// give the produced packets to courier.
|
||||
/* Read members from file, decompress their contents, and give to courier
|
||||
the packets produced.
|
||||
*/
|
||||
extern "C" void * dworker_o( void * arg )
|
||||
{
|
||||
const Worker_arg & tmp = *(const Worker_arg *)arg;
|
||||
const Lzip_index & lzip_index = *tmp.lzip_index;
|
||||
Packet_courier & courier = *tmp.courier;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
Shared_retval & shared_retval = *tmp.shared_retval;
|
||||
const int worker_id = tmp.worker_id;
|
||||
const int num_workers = tmp.num_workers;
|
||||
const int infd = tmp.infd;
|
||||
const int buffer_size = 65536;
|
||||
|
||||
uint8_t * new_data = new( std::nothrow ) uint8_t[max_packet_size];
|
||||
int new_pos = 0;
|
||||
uint8_t * new_data = 0;
|
||||
uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size];
|
||||
LZ_Decoder * const decoder = LZ_decompress_open();
|
||||
if( !new_data || !ibuffer || !decoder ||
|
||||
LZ_decompress_errno( decoder ) != LZ_ok )
|
||||
{ pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
int new_pos = 0;
|
||||
if( !ibuffer || !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
|
||||
{ if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
|
||||
|
||||
for( long i = worker_id; i < lzip_index.members(); i += num_workers )
|
||||
{
|
||||
|
@ -184,6 +198,7 @@ extern "C" void * dworker_o( void * arg )
|
|||
|
||||
while( member_rest > 0 )
|
||||
{
|
||||
if( shared_retval() ) goto done; // other worker found a problem
|
||||
while( LZ_decompress_write_size( decoder ) > 0 )
|
||||
{
|
||||
const int size = std::min( LZ_decompress_write_size( decoder ),
|
||||
|
@ -191,7 +206,8 @@ extern "C" void * dworker_o( void * arg )
|
|||
if( size > 0 )
|
||||
{
|
||||
if( preadblock( infd, ibuffer, size, member_pos ) != size )
|
||||
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
|
||||
{ if( shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Read error", errno ); } goto done; }
|
||||
member_pos += size;
|
||||
member_rest -= size;
|
||||
if( LZ_decompress_write( decoder, ibuffer, size ) != size )
|
||||
|
@ -201,60 +217,60 @@ extern "C" void * dworker_o( void * arg )
|
|||
}
|
||||
while( true ) // read and pack decompressed data
|
||||
{
|
||||
if( !new_data &&
|
||||
!( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) )
|
||||
{ if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
|
||||
const int rd = LZ_decompress_read( decoder, new_data + new_pos,
|
||||
max_packet_size - new_pos );
|
||||
if( rd < 0 )
|
||||
cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) );
|
||||
{ decompress_error( decoder, pp, shared_retval, worker_id );
|
||||
goto done; }
|
||||
new_pos += rd;
|
||||
if( new_pos > max_packet_size )
|
||||
internal_error( "opacket size exceeded in worker." );
|
||||
if( new_pos == max_packet_size ||
|
||||
LZ_decompress_finished( decoder ) == 1 )
|
||||
const bool eom = LZ_decompress_finished( decoder ) == 1;
|
||||
if( new_pos == max_packet_size || eom ) // make data packet
|
||||
{
|
||||
if( new_pos > 0 ) // make data packet
|
||||
{
|
||||
Packet * const opacket = new Packet( new_data, new_pos );
|
||||
courier.collect_packet( opacket, worker_id );
|
||||
new_pos = 0;
|
||||
new_data = new( std::nothrow ) uint8_t[max_packet_size];
|
||||
if( !new_data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
}
|
||||
if( LZ_decompress_finished( decoder ) == 1 )
|
||||
{ // end of member token
|
||||
courier.collect_packet( new Packet, worker_id );
|
||||
LZ_decompress_reset( decoder ); // prepare for new member
|
||||
break;
|
||||
}
|
||||
const Packet * const opacket =
|
||||
new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
|
||||
courier.collect_packet( opacket, worker_id );
|
||||
if( new_pos > 0 ) { new_pos = 0; new_data = 0; }
|
||||
if( eom )
|
||||
{ LZ_decompress_reset( decoder ); // prepare for new member
|
||||
break; }
|
||||
}
|
||||
if( rd == 0 ) break;
|
||||
}
|
||||
}
|
||||
show_progress( lzip_index.mblock( i ).size() );
|
||||
}
|
||||
|
||||
delete[] ibuffer; delete[] new_data;
|
||||
if( LZ_decompress_member_position( decoder ) != 0 )
|
||||
{ pp( "Error, some data remains in decoder." ); cleanup_and_fail(); }
|
||||
if( LZ_decompress_close( decoder ) < 0 )
|
||||
{ pp( "LZ_decompress_close failed." ); cleanup_and_fail(); }
|
||||
done:
|
||||
delete[] ibuffer; if( new_data ) delete[] new_data;
|
||||
if( LZ_decompress_member_position( decoder ) != 0 &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
pp( "Error, some data remains in decoder." );
|
||||
if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
|
||||
pp( "LZ_decompress_close failed." );
|
||||
courier.worker_finished();
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
// get from courier the processed and sorted packets, and write
|
||||
// their contents to the output file.
|
||||
void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
||||
/* Get from courier the processed and sorted packets, and write their
|
||||
contents to the output file. Drain queue on error.
|
||||
*/
|
||||
void muxer( Packet_courier & courier, const Pretty_print & pp,
|
||||
Shared_retval & shared_retval, const int outfd )
|
||||
{
|
||||
while( true )
|
||||
{
|
||||
Packet * const opacket = courier.deliver_packet();
|
||||
const Packet * const opacket = courier.deliver_packet();
|
||||
if( !opacket ) break; // queue is empty. all workers exited
|
||||
|
||||
const int wr = writeblock( outfd, opacket->data, opacket->size );
|
||||
if( wr != opacket->size )
|
||||
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); }
|
||||
delete[] opacket->data;
|
||||
if( shared_retval() == 0 &&
|
||||
writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Write error", errno ); }
|
||||
delete opacket;
|
||||
}
|
||||
}
|
||||
|
@ -262,66 +278,59 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
|||
} // end namespace
|
||||
|
||||
|
||||
// init the courier, then start the workers and call the muxer.
|
||||
// init the courier, then start the workers and call the muxer.
|
||||
int dec_stdout( const int num_workers, const int infd, const int outfd,
|
||||
const Pretty_print & pp, const int debug_level,
|
||||
const int out_slots, const Lzip_index & lzip_index )
|
||||
{
|
||||
Packet_courier courier( num_workers, out_slots );
|
||||
Shared_retval shared_retval;
|
||||
Packet_courier courier( shared_retval, num_workers, out_slots );
|
||||
|
||||
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
|
||||
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
|
||||
if( !worker_args || !worker_threads )
|
||||
{ pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
{ pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
|
||||
|
||||
int i = 0; // number of workers started
|
||||
for( ; i < num_workers; ++i )
|
||||
{
|
||||
worker_args[i].lzip_index = &lzip_index;
|
||||
worker_args[i].courier = &courier;
|
||||
worker_args[i].pp = &pp;
|
||||
worker_args[i].shared_retval = &shared_retval;
|
||||
worker_args[i].worker_id = i;
|
||||
worker_args[i].num_workers = num_workers;
|
||||
worker_args[i].infd = infd;
|
||||
const int errcode =
|
||||
pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); }
|
||||
{ if( shared_retval.set_value( 1 ) )
|
||||
{ show_error( "Can't create worker threads", errcode ); } break; }
|
||||
}
|
||||
|
||||
muxer( courier, pp, outfd );
|
||||
muxer( courier, pp, shared_retval, outfd );
|
||||
|
||||
for( int i = num_workers - 1; i >= 0; --i )
|
||||
while( --i >= 0 )
|
||||
{
|
||||
const int errcode = pthread_join( worker_threads[i], 0 );
|
||||
if( errcode )
|
||||
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); }
|
||||
if( errcode && shared_retval.set_value( 1 ) )
|
||||
show_error( "Can't join worker threads", errcode );
|
||||
}
|
||||
delete[] worker_threads;
|
||||
delete[] worker_args;
|
||||
|
||||
if( verbosity >= 2 )
|
||||
{
|
||||
if( verbosity >= 4 ) show_header( lzip_index.dictionary_size( 0 ) );
|
||||
const unsigned long long in_size = lzip_index.cdata_size();
|
||||
const unsigned long long out_size = lzip_index.udata_size();
|
||||
if( out_size == 0 || in_size == 0 )
|
||||
std::fputs( "no data compressed. ", stderr );
|
||||
else
|
||||
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
|
||||
(double)out_size / in_size,
|
||||
( 100.0 * in_size ) / out_size,
|
||||
100.0 - ( ( 100.0 * in_size ) / out_size ) );
|
||||
if( verbosity >= 3 )
|
||||
std::fprintf( stderr, "decompressed %9llu, compressed %8llu. ",
|
||||
out_size, in_size );
|
||||
}
|
||||
if( verbosity >= 1 ) std::fputs( "done\n", stderr );
|
||||
if( shared_retval() ) return shared_retval(); // some thread found a problem
|
||||
|
||||
if( verbosity >= 1 )
|
||||
show_results( lzip_index.cdata_size(), lzip_index.udata_size(),
|
||||
lzip_index.dictionary_size(), false );
|
||||
|
||||
if( debug_level & 1 )
|
||||
std::fprintf( stderr,
|
||||
"workers started %8u\n"
|
||||
"muxer tried to consume from workers %8u times\n"
|
||||
"muxer had to wait %8u times\n",
|
||||
courier.ocheck_counter,
|
||||
courier.owait_counter );
|
||||
num_workers, courier.ocheck_counter, courier.owait_counter );
|
||||
|
||||
if( !courier.finished() ) internal_error( "courier not finished." );
|
||||
return 0;
|
||||
|
|
667
dec_stream.cc
667
dec_stream.cc
|
@ -1,19 +1,19 @@
|
|||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
@ -28,13 +28,19 @@
|
|||
#include <queue>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <pthread.h>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#include <lzlib.h>
|
||||
|
||||
#include "lzip.h"
|
||||
|
||||
/* When a problem is detected by any thread:
|
||||
- the thread sets shared_retval to 1 or 2.
|
||||
- the splitter sets eof and returns.
|
||||
- the courier discards new packets received or collected.
|
||||
- the workers drain the queue and return.
|
||||
- the muxer drains the queue and returns.
|
||||
(Draining seems to be faster than cleaning up later). */
|
||||
|
||||
namespace {
|
||||
|
||||
|
@ -45,10 +51,13 @@ unsigned long long out_size = 0;
|
|||
|
||||
struct Packet // data block
|
||||
{
|
||||
uint8_t * data; // data == 0 means end of member
|
||||
uint8_t * data; // data may be null if size == 0
|
||||
int size; // number of bytes in data (if any)
|
||||
explicit Packet( uint8_t * const d = 0, const int s = 0 )
|
||||
: data( d ), size( s ) {}
|
||||
bool eom; // end of member
|
||||
Packet() : data( 0 ), size( 0 ), eom( true ) {}
|
||||
Packet( uint8_t * const d, const int s, const bool e )
|
||||
: data( d ), size( s ), eom ( e ) {}
|
||||
~Packet() { if( data ) delete[] data; }
|
||||
};
|
||||
|
||||
|
||||
|
@ -63,8 +72,8 @@ private:
|
|||
int receive_worker_id; // worker queue currently receiving packets
|
||||
int deliver_worker_id; // worker queue currently delivering packets
|
||||
Slot_tally slot_tally; // limits the number of input packets
|
||||
std::vector< std::queue< Packet * > > ipacket_queues;
|
||||
std::vector< std::queue< Packet * > > opacket_queues;
|
||||
std::vector< std::queue< const Packet * > > ipacket_queues;
|
||||
std::vector< std::queue< const Packet * > > opacket_queues;
|
||||
int num_working; // number of workers still running
|
||||
const int num_workers; // number of workers
|
||||
const unsigned out_slots; // max output packets per queue
|
||||
|
@ -73,20 +82,23 @@ private:
|
|||
pthread_mutex_t omutex;
|
||||
pthread_cond_t oav_or_exit; // output packet available or all workers exited
|
||||
std::vector< pthread_cond_t > slot_av; // output slot available
|
||||
const Shared_retval & shared_retval; // discard new packets on error
|
||||
bool eof; // splitter done
|
||||
bool trailing_data_found_; // a worker found trailing data
|
||||
|
||||
Packet_courier( const Packet_courier & ); // declared as private
|
||||
void operator=( const Packet_courier & ); // declared as private
|
||||
|
||||
public:
|
||||
Packet_courier( const int workers, const int in_slots, const int oslots )
|
||||
Packet_courier( const Shared_retval & sh_ret, const int workers,
|
||||
const int in_slots, const int oslots )
|
||||
: icheck_counter( 0 ), iwait_counter( 0 ),
|
||||
ocheck_counter( 0 ), owait_counter( 0 ),
|
||||
receive_worker_id( 0 ), deliver_worker_id( 0 ),
|
||||
slot_tally( in_slots ), ipacket_queues( workers ),
|
||||
opacket_queues( workers ), num_working( workers ),
|
||||
num_workers( workers ), out_slots( oslots ), slot_av( workers ),
|
||||
eof( false )
|
||||
shared_retval( sh_ret ), eof( false ), trailing_data_found_( false )
|
||||
{
|
||||
xinit_mutex( &imutex ); xinit_cond( &iav_or_eof );
|
||||
xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
|
||||
|
@ -95,30 +107,37 @@ public:
|
|||
|
||||
~Packet_courier()
|
||||
{
|
||||
if( shared_retval() ) // cleanup to avoid memory leaks
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
{
|
||||
while( !ipacket_queues[i].empty() )
|
||||
{ delete ipacket_queues[i].front(); ipacket_queues[i].pop(); }
|
||||
while( !opacket_queues[i].empty() )
|
||||
{ delete opacket_queues[i].front(); opacket_queues[i].pop(); }
|
||||
}
|
||||
for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
|
||||
xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
|
||||
xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex );
|
||||
}
|
||||
|
||||
// make a packet with data received from splitter
|
||||
// if data == 0 (end of member token), move to next queue
|
||||
void receive_packet( uint8_t * const data, const int size )
|
||||
/* Make a packet with data received from splitter.
|
||||
If eom == true (end of member), move to next queue. */
|
||||
void receive_packet( uint8_t * const data, const int size, const bool eom )
|
||||
{
|
||||
Packet * const ipacket = new Packet( data, size );
|
||||
if( data )
|
||||
{ in_size += size; slot_tally.get_slot(); } // wait for a free slot
|
||||
if( shared_retval() ) { delete[] data; return; } // discard packet on error
|
||||
const Packet * const ipacket = new Packet( data, size, eom );
|
||||
slot_tally.get_slot(); // wait for a free slot
|
||||
xlock( &imutex );
|
||||
ipacket_queues[receive_worker_id].push( ipacket );
|
||||
xbroadcast( &iav_or_eof );
|
||||
xunlock( &imutex );
|
||||
if( !data && ++receive_worker_id >= num_workers )
|
||||
receive_worker_id = 0;
|
||||
if( eom && ++receive_worker_id >= num_workers ) receive_worker_id = 0;
|
||||
}
|
||||
|
||||
// distribute a packet to a worker
|
||||
Packet * distribute_packet( const int worker_id )
|
||||
const Packet * distribute_packet( const int worker_id )
|
||||
{
|
||||
Packet * ipacket = 0;
|
||||
const Packet * ipacket = 0;
|
||||
xlock( &imutex );
|
||||
++icheck_counter;
|
||||
while( ipacket_queues[worker_id].empty() && !eof )
|
||||
|
@ -132,37 +151,38 @@ public:
|
|||
ipacket_queues[worker_id].pop();
|
||||
}
|
||||
xunlock( &imutex );
|
||||
if( ipacket )
|
||||
{ if( ipacket->data ) slot_tally.leave_slot(); }
|
||||
else
|
||||
if( ipacket ) slot_tally.leave_slot();
|
||||
else // no more packets
|
||||
{
|
||||
// notify muxer when last worker exits
|
||||
xlock( &omutex );
|
||||
xlock( &omutex ); // notify muxer when last worker exits
|
||||
if( --num_working == 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
}
|
||||
return ipacket;
|
||||
}
|
||||
|
||||
// collect a packet from a worker
|
||||
void collect_packet( Packet * const opacket, const int worker_id )
|
||||
// collect a packet from a worker, discard packet on error
|
||||
void collect_packet( const Packet * const opacket, const int worker_id )
|
||||
{
|
||||
xlock( &omutex );
|
||||
if( opacket->data )
|
||||
{
|
||||
while( opacket_queues[worker_id].size() >= out_slots )
|
||||
{
|
||||
if( shared_retval() ) { delete opacket; goto done; }
|
||||
xwait( &slot_av[worker_id], &omutex );
|
||||
}
|
||||
}
|
||||
opacket_queues[worker_id].push( opacket );
|
||||
if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
|
||||
done:
|
||||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
// deliver a packet to muxer
|
||||
// if packet data == 0, move to next queue and wait again
|
||||
Packet * deliver_packet()
|
||||
/* deliver a packet to muxer
|
||||
if packet->eom, move to next queue
|
||||
if packet data == 0, wait again */
|
||||
const Packet * deliver_packet()
|
||||
{
|
||||
Packet * opacket = 0;
|
||||
const Packet * opacket = 0;
|
||||
xlock( &omutex );
|
||||
++ocheck_counter;
|
||||
while( true )
|
||||
|
@ -177,27 +197,37 @@ public:
|
|||
opacket_queues[deliver_worker_id].pop();
|
||||
if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
|
||||
xsignal( &slot_av[deliver_worker_id] );
|
||||
if( opacket->eom && ++deliver_worker_id >= num_workers )
|
||||
deliver_worker_id = 0;
|
||||
if( opacket->data ) break;
|
||||
if( ++deliver_worker_id >= num_workers ) deliver_worker_id = 0;
|
||||
delete opacket; opacket = 0;
|
||||
}
|
||||
xunlock( &omutex );
|
||||
return opacket;
|
||||
}
|
||||
|
||||
void add_out_size( const unsigned long long partial_out_size )
|
||||
{
|
||||
xlock( &omutex );
|
||||
out_size += partial_out_size;
|
||||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
void finish() // splitter has no more packets to send
|
||||
void add_sizes( const unsigned long long partial_in_size,
|
||||
const unsigned long long partial_out_size )
|
||||
{
|
||||
xlock( &imutex );
|
||||
in_size += partial_in_size;
|
||||
out_size += partial_out_size;
|
||||
xunlock( &imutex );
|
||||
}
|
||||
|
||||
void set_trailing_flag() { trailing_data_found_ = true; }
|
||||
bool trailing_data_found() { return trailing_data_found_; }
|
||||
|
||||
void finish( const int workers_started )
|
||||
{
|
||||
xlock( &imutex ); // splitter has no more packets to send
|
||||
eof = true;
|
||||
xbroadcast( &iav_or_eof );
|
||||
xunlock( &imutex );
|
||||
xlock( &omutex ); // notify muxer if all workers have exited
|
||||
num_working -= num_workers - workers_started; // workers spared
|
||||
if( num_working <= 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
bool finished() // all packets delivered to muxer
|
||||
|
@ -212,173 +242,60 @@ public:
|
|||
};
|
||||
|
||||
|
||||
// Search forward from 'pos' for "LZIP" (Boyer-Moore algorithm)
|
||||
// Returns pos of found string or 'pos+size' if not found.
|
||||
//
|
||||
int find_magic( const uint8_t * const buffer, const int pos, const int size )
|
||||
{
|
||||
const uint8_t table[256] = {
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,1,4,4,3,4,4,4,4,4,4,4,4,4,4,4,4,4,2,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
|
||||
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4 };
|
||||
|
||||
for( int i = pos; i <= pos + size - 4; i += table[buffer[i+3]] )
|
||||
if( buffer[i] == 'L' && buffer[i+1] == 'Z' &&
|
||||
buffer[i+2] == 'I' && buffer[i+3] == 'P' )
|
||||
return i; // magic string found
|
||||
return pos + size;
|
||||
}
|
||||
|
||||
|
||||
struct Splitter_arg
|
||||
{
|
||||
unsigned long long cfile_size;
|
||||
Packet_courier * courier;
|
||||
const Pretty_print * pp;
|
||||
int infd;
|
||||
unsigned dictionary_size; // returned by splitter to main thread
|
||||
};
|
||||
|
||||
|
||||
// split data from input file into chunks and pass them to
|
||||
// courier for packaging and distribution to workers.
|
||||
extern "C" void * dsplitter_s( void * arg )
|
||||
{
|
||||
Splitter_arg & tmp = *(Splitter_arg *)arg;
|
||||
Packet_courier & courier = *tmp.courier;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
const int infd = tmp.infd;
|
||||
const int hsize = Lzip_header::size;
|
||||
const int tsize = Lzip_trailer::size;
|
||||
const int buffer_size = max_packet_size;
|
||||
const int base_buffer_size = tsize + buffer_size + hsize;
|
||||
uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size];
|
||||
if( !base_buffer ) { pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
uint8_t * const buffer = base_buffer + tsize;
|
||||
|
||||
int size = readblock( infd, buffer, buffer_size + hsize ) - hsize;
|
||||
bool at_stream_end = ( size < buffer_size );
|
||||
if( size != buffer_size && errno )
|
||||
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
|
||||
if( size + hsize < min_member_size )
|
||||
{ show_file_error( pp.name(), "Input file is too short." );
|
||||
cleanup_and_fail( 2 ); }
|
||||
const Lzip_header & header = *(const Lzip_header *)buffer;
|
||||
if( !header.verify_magic() )
|
||||
{ show_file_error( pp.name(), bad_magic_msg ); cleanup_and_fail( 2 ); }
|
||||
if( !header.verify_version() )
|
||||
{ pp( bad_version( header.version() ) ); cleanup_and_fail( 2 ); }
|
||||
tmp.dictionary_size = header.dictionary_size();
|
||||
if( !isvalid_ds( tmp.dictionary_size ) )
|
||||
{ pp( bad_dict_msg ); cleanup_and_fail( 2 ); }
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, tmp.cfile_size, &pp ); // init
|
||||
|
||||
unsigned long long partial_member_size = 0;
|
||||
while( true )
|
||||
{
|
||||
int pos = 0;
|
||||
for( int newpos = 1; newpos <= size; ++newpos )
|
||||
{
|
||||
newpos = find_magic( buffer, newpos, size + 4 - newpos );
|
||||
if( newpos <= size )
|
||||
{
|
||||
const Lzip_trailer & trailer =
|
||||
*(const Lzip_trailer *)(buffer + newpos - tsize);
|
||||
const unsigned long long member_size = trailer.member_size();
|
||||
if( partial_member_size + newpos - pos == member_size )
|
||||
{ // header found
|
||||
const Lzip_header & header = *(const Lzip_header *)(buffer + newpos);
|
||||
if( !header.verify_version() )
|
||||
{ pp( bad_version( header.version() ) ); cleanup_and_fail( 2 ); }
|
||||
const unsigned dictionary_size = header.dictionary_size();
|
||||
if( !isvalid_ds( dictionary_size ) )
|
||||
{ pp( bad_dict_msg ); cleanup_and_fail( 2 ); }
|
||||
uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos];
|
||||
if( !data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
std::memcpy( data, buffer + pos, newpos - pos );
|
||||
courier.receive_packet( data, newpos - pos );
|
||||
courier.receive_packet( 0, 0 ); // end of member token
|
||||
partial_member_size = 0;
|
||||
pos = newpos;
|
||||
show_progress( member_size );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if( at_stream_end )
|
||||
{
|
||||
uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos];
|
||||
if( !data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
std::memcpy( data, buffer + pos, size + hsize - pos );
|
||||
courier.receive_packet( data, size + hsize - pos );
|
||||
courier.receive_packet( 0, 0 ); // end of member token
|
||||
break;
|
||||
}
|
||||
if( pos < buffer_size )
|
||||
{
|
||||
partial_member_size += buffer_size - pos;
|
||||
uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos];
|
||||
if( !data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
std::memcpy( data, buffer + pos, buffer_size - pos );
|
||||
courier.receive_packet( data, buffer_size - pos );
|
||||
}
|
||||
std::memcpy( base_buffer, base_buffer + buffer_size, tsize + hsize );
|
||||
size = readblock( infd, buffer + hsize, buffer_size );
|
||||
at_stream_end = ( size < buffer_size );
|
||||
if( size != buffer_size && errno )
|
||||
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
|
||||
}
|
||||
delete[] base_buffer;
|
||||
courier.finish(); // no more packets to send
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
struct Worker_arg
|
||||
{
|
||||
Packet_courier * courier;
|
||||
const Pretty_print * pp;
|
||||
Shared_retval * shared_retval;
|
||||
int worker_id;
|
||||
bool ignore_trailing;
|
||||
bool loose_trailing;
|
||||
bool testing;
|
||||
bool nocopy; // avoid copying decompressed data when testing
|
||||
};
|
||||
|
||||
struct Splitter_arg
|
||||
{
|
||||
struct Worker_arg worker_arg;
|
||||
Worker_arg * worker_args;
|
||||
pthread_t * worker_threads;
|
||||
unsigned long long cfile_size;
|
||||
int infd;
|
||||
unsigned dictionary_size; // returned by splitter to main thread
|
||||
int num_workers; // returned by splitter to main thread
|
||||
};
|
||||
|
||||
|
||||
// consume packets from courier, decompress their contents and,
|
||||
// if not testing, give the produced packets to courier.
|
||||
/* Consume packets from courier, decompress their contents and, if not
|
||||
testing, give to courier the packets produced.
|
||||
*/
|
||||
extern "C" void * dworker_s( void * arg )
|
||||
{
|
||||
const Worker_arg & tmp = *(const Worker_arg *)arg;
|
||||
Packet_courier & courier = *tmp.courier;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
Shared_retval & shared_retval = *tmp.shared_retval;
|
||||
const int worker_id = tmp.worker_id;
|
||||
const bool ignore_trailing = tmp.ignore_trailing;
|
||||
const bool loose_trailing = tmp.loose_trailing;
|
||||
const bool testing = tmp.testing;
|
||||
const bool nocopy = tmp.nocopy;
|
||||
|
||||
uint8_t * new_data = new( std::nothrow ) uint8_t[max_packet_size];
|
||||
LZ_Decoder * const decoder = LZ_decompress_open();
|
||||
if( !new_data || !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
|
||||
{ pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
unsigned long long partial_out_size = 0;
|
||||
unsigned long long partial_in_size = 0, partial_out_size = 0;
|
||||
int new_pos = 0;
|
||||
bool trailing_data_found = false;
|
||||
bool draining = false; // either trailing data or an error were found
|
||||
uint8_t * new_data = 0;
|
||||
LZ_Decoder * const decoder = LZ_decompress_open();
|
||||
if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
|
||||
{ draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg ); }
|
||||
|
||||
while( true )
|
||||
{
|
||||
const Packet * const ipacket = courier.distribute_packet( worker_id );
|
||||
if( !ipacket ) break; // no more packets to process
|
||||
if( !ipacket->data ) LZ_decompress_finish( decoder );
|
||||
|
||||
int written = 0;
|
||||
while( !trailing_data_found )
|
||||
while( !draining ) // else discard trailing data or drain queue
|
||||
{
|
||||
if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket->size )
|
||||
{
|
||||
|
@ -389,85 +306,255 @@ extern "C" void * dworker_s( void * arg )
|
|||
if( written > ipacket->size )
|
||||
internal_error( "ipacket size exceeded in worker." );
|
||||
}
|
||||
while( !trailing_data_found ) // read and pack decompressed data
|
||||
if( ipacket->eom && written == ipacket->size )
|
||||
LZ_decompress_finish( decoder );
|
||||
unsigned long long total_in = 0; // detect empty member + corrupt header
|
||||
while( !draining ) // read and pack decompressed data
|
||||
{
|
||||
const int rd = LZ_decompress_read( decoder, new_data + new_pos,
|
||||
if( !nocopy && !new_data &&
|
||||
!( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) )
|
||||
{ draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg );
|
||||
break; }
|
||||
const int rd = LZ_decompress_read( decoder,
|
||||
nocopy ? 0 : new_data + new_pos,
|
||||
max_packet_size - new_pos );
|
||||
if( rd < 0 )
|
||||
if( rd < 0 ) // trailing data or decoder error
|
||||
{
|
||||
draining = true;
|
||||
const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder );
|
||||
if( lz_errno == LZ_header_error )
|
||||
{
|
||||
trailing_data_found = true;
|
||||
courier.set_trailing_flag();
|
||||
if( !ignore_trailing )
|
||||
{ pp( trailing_msg ); cleanup_and_fail( 2 ); }
|
||||
{ if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); }
|
||||
}
|
||||
else if( lz_errno == LZ_data_error &&
|
||||
LZ_decompress_member_position( decoder ) == 0 )
|
||||
{
|
||||
trailing_data_found = true;
|
||||
courier.set_trailing_flag();
|
||||
if( !loose_trailing )
|
||||
{ pp( corrupt_mm_msg ); cleanup_and_fail( 2 ); }
|
||||
{ if( shared_retval.set_value( 2 ) ) pp( corrupt_mm_msg ); }
|
||||
else if( !ignore_trailing )
|
||||
{ pp( trailing_msg ); cleanup_and_fail( 2 ); }
|
||||
{ if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); }
|
||||
}
|
||||
else
|
||||
cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) );
|
||||
decompress_error( decoder, pp, shared_retval, worker_id );
|
||||
}
|
||||
else new_pos += rd;
|
||||
if( new_pos > max_packet_size )
|
||||
internal_error( "opacket size exceeded in worker." );
|
||||
if( new_pos == max_packet_size || trailing_data_found ||
|
||||
LZ_decompress_finished( decoder ) == 1 )
|
||||
if( LZ_decompress_member_finished( decoder ) == 1 )
|
||||
{
|
||||
if( !testing && new_pos > 0 ) // make data packet
|
||||
{
|
||||
Packet * const opacket = new Packet( new_data, new_pos );
|
||||
courier.collect_packet( opacket, worker_id );
|
||||
new_data = new( std::nothrow ) uint8_t[max_packet_size];
|
||||
if( !new_data ) { pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
}
|
||||
partial_out_size += new_pos;
|
||||
new_pos = 0;
|
||||
if( trailing_data_found || LZ_decompress_finished( decoder ) == 1 )
|
||||
{
|
||||
if( !testing ) // end of member token
|
||||
courier.collect_packet( new Packet, worker_id );
|
||||
LZ_decompress_reset( decoder ); // prepare for new member
|
||||
break;
|
||||
}
|
||||
partial_in_size += LZ_decompress_member_position( decoder );
|
||||
partial_out_size += LZ_decompress_data_position( decoder );
|
||||
}
|
||||
const bool eom = draining || LZ_decompress_finished( decoder ) == 1;
|
||||
if( new_pos == max_packet_size || eom )
|
||||
{
|
||||
if( !testing ) // make data packet
|
||||
{
|
||||
const Packet * const opacket =
|
||||
new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
|
||||
courier.collect_packet( opacket, worker_id );
|
||||
if( new_pos > 0 ) new_data = 0;
|
||||
}
|
||||
new_pos = 0;
|
||||
if( eom )
|
||||
{ LZ_decompress_reset( decoder ); // prepare for new member
|
||||
break; }
|
||||
}
|
||||
if( rd == 0 )
|
||||
{
|
||||
const unsigned long long size = LZ_decompress_total_in_size( decoder );
|
||||
if( total_in == size ) break; else total_in = size;
|
||||
}
|
||||
if( rd == 0 ) break;
|
||||
}
|
||||
if( !ipacket->data || written == ipacket->size ) break;
|
||||
}
|
||||
if( ipacket->data ) delete[] ipacket->data;
|
||||
delete ipacket;
|
||||
}
|
||||
|
||||
delete[] new_data;
|
||||
courier.add_out_size( partial_out_size );
|
||||
if( LZ_decompress_member_position( decoder ) != 0 )
|
||||
{ pp( "Error, some data remains in decoder." ); cleanup_and_fail(); }
|
||||
if( LZ_decompress_close( decoder ) < 0 )
|
||||
{ pp( "LZ_decompress_close failed." ); cleanup_and_fail(); }
|
||||
if( new_data ) delete[] new_data;
|
||||
courier.add_sizes( partial_in_size, partial_out_size );
|
||||
if( LZ_decompress_member_position( decoder ) != 0 &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
pp( "Error, some data remains in decoder." );
|
||||
if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
|
||||
pp( "LZ_decompress_close failed." );
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
// get from courier the processed and sorted packets, and write
|
||||
// their contents to the output file.
|
||||
void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
||||
bool start_worker( const Worker_arg & worker_arg,
|
||||
Worker_arg * const worker_args,
|
||||
pthread_t * const worker_threads, const int worker_id,
|
||||
Shared_retval & shared_retval )
|
||||
{
|
||||
worker_args[worker_id] = worker_arg;
|
||||
worker_args[worker_id].worker_id = worker_id;
|
||||
const int errcode = pthread_create( &worker_threads[worker_id], 0,
|
||||
dworker_s, &worker_args[worker_id] );
|
||||
if( errcode && shared_retval.set_value( 1 ) )
|
||||
show_error( "Can't create worker threads", errcode );
|
||||
return errcode == 0;
|
||||
}
|
||||
|
||||
|
||||
/* Split data from input file into chunks and pass them to courier for
|
||||
packaging and distribution to workers.
|
||||
Start a worker per member up to a maximum of num_workers.
|
||||
*/
|
||||
extern "C" void * dsplitter_s( void * arg )
|
||||
{
|
||||
Splitter_arg & tmp = *(Splitter_arg *)arg;
|
||||
const Worker_arg & worker_arg = tmp.worker_arg;
|
||||
Packet_courier & courier = *worker_arg.courier;
|
||||
const Pretty_print & pp = *worker_arg.pp;
|
||||
Shared_retval & shared_retval = *worker_arg.shared_retval;
|
||||
Worker_arg * const worker_args = tmp.worker_args;
|
||||
pthread_t * const worker_threads = tmp.worker_threads;
|
||||
const int infd = tmp.infd;
|
||||
int worker_id = 0; // number of workers started
|
||||
const int hsize = Lzip_header::size;
|
||||
const int tsize = Lzip_trailer::size;
|
||||
const int buffer_size = max_packet_size;
|
||||
// buffer with room for trailer, header, data, and sentinel "LZIP"
|
||||
const int base_buffer_size = tsize + hsize + buffer_size + 4;
|
||||
uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size];
|
||||
if( !base_buffer )
|
||||
{
|
||||
mem_fail:
|
||||
if( shared_retval.set_value( 1 ) ) pp( mem_msg );
|
||||
fail:
|
||||
delete[] base_buffer;
|
||||
courier.finish( worker_id ); // no more packets to send
|
||||
tmp.num_workers = worker_id;
|
||||
return 0;
|
||||
}
|
||||
uint8_t * const buffer = base_buffer + tsize;
|
||||
|
||||
int size = readblock( infd, buffer, buffer_size + hsize ) - hsize;
|
||||
bool at_stream_end = ( size < buffer_size );
|
||||
if( size != buffer_size && errno )
|
||||
{ if( shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Read error", errno ); } goto fail; }
|
||||
if( size + hsize < min_member_size )
|
||||
{ if( shared_retval.set_value( 2 ) ) show_file_error( pp.name(),
|
||||
( size <= 0 ) ? "File ends unexpectedly at member header." :
|
||||
"Input file is too short." ); goto fail; }
|
||||
const Lzip_header & header = *(const Lzip_header *)buffer;
|
||||
if( !header.verify_magic() )
|
||||
{ if( shared_retval.set_value( 2 ) )
|
||||
{ show_file_error( pp.name(), bad_magic_msg ); } goto fail; }
|
||||
if( !header.verify_version() )
|
||||
{ if( shared_retval.set_value( 2 ) )
|
||||
{ pp( bad_version( header.version() ) ); } goto fail; }
|
||||
tmp.dictionary_size = header.dictionary_size();
|
||||
if( !isvalid_ds( tmp.dictionary_size ) )
|
||||
{ if( shared_retval.set_value( 2 ) ) { pp( bad_dict_msg ); } goto fail; }
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, tmp.cfile_size, &pp ); // init
|
||||
|
||||
unsigned long long partial_member_size = 0;
|
||||
bool worker_pending = true; // start 1 worker per first packet of member
|
||||
while( true )
|
||||
{
|
||||
if( shared_retval() ) break; // stop sending packets on error
|
||||
int pos = 0; // current searching position
|
||||
std::memcpy( buffer + hsize + size, lzip_magic, 4 ); // sentinel
|
||||
for( int newpos = 1; newpos <= size; ++newpos )
|
||||
{
|
||||
while( buffer[newpos] != lzip_magic[0] ||
|
||||
buffer[newpos+1] != lzip_magic[1] ||
|
||||
buffer[newpos+2] != lzip_magic[2] ||
|
||||
buffer[newpos+3] != lzip_magic[3] ) ++newpos;
|
||||
if( newpos <= size )
|
||||
{
|
||||
const Lzip_trailer & trailer =
|
||||
*(const Lzip_trailer *)(buffer + newpos - tsize);
|
||||
const unsigned long long member_size = trailer.member_size();
|
||||
if( partial_member_size + newpos - pos == member_size &&
|
||||
trailer.verify_consistency() )
|
||||
{ // header found
|
||||
const Lzip_header & header = *(const Lzip_header *)(buffer + newpos);
|
||||
if( !header.verify_version() )
|
||||
{ if( shared_retval.set_value( 2 ) )
|
||||
{ pp( bad_version( header.version() ) ); } goto fail; }
|
||||
const unsigned dictionary_size = header.dictionary_size();
|
||||
if( !isvalid_ds( dictionary_size ) )
|
||||
{ if( shared_retval.set_value( 2 ) ) pp( bad_dict_msg );
|
||||
goto fail; }
|
||||
if( tmp.dictionary_size < dictionary_size )
|
||||
tmp.dictionary_size = dictionary_size;
|
||||
uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos];
|
||||
if( !data ) goto mem_fail;
|
||||
std::memcpy( data, buffer + pos, newpos - pos );
|
||||
courier.receive_packet( data, newpos - pos, true ); // eom
|
||||
partial_member_size = 0;
|
||||
pos = newpos;
|
||||
if( worker_pending )
|
||||
{ if( !start_worker( worker_arg, worker_args, worker_threads,
|
||||
worker_id, shared_retval ) ) goto fail;
|
||||
++worker_id; }
|
||||
worker_pending = worker_id < tmp.num_workers;
|
||||
show_progress( member_size );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if( at_stream_end )
|
||||
{
|
||||
uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos];
|
||||
if( !data ) goto mem_fail;
|
||||
std::memcpy( data, buffer + pos, size + hsize - pos );
|
||||
courier.receive_packet( data, size + hsize - pos, true ); // eom
|
||||
if( worker_pending &&
|
||||
start_worker( worker_arg, worker_args, worker_threads,
|
||||
worker_id, shared_retval ) ) ++worker_id;
|
||||
break;
|
||||
}
|
||||
if( pos < buffer_size )
|
||||
{
|
||||
partial_member_size += buffer_size - pos;
|
||||
uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos];
|
||||
if( !data ) goto mem_fail;
|
||||
std::memcpy( data, buffer + pos, buffer_size - pos );
|
||||
courier.receive_packet( data, buffer_size - pos, false );
|
||||
if( worker_pending )
|
||||
{ if( !start_worker( worker_arg, worker_args, worker_threads,
|
||||
worker_id, shared_retval ) ) break;
|
||||
++worker_id; worker_pending = false; }
|
||||
}
|
||||
if( courier.trailing_data_found() ) break;
|
||||
std::memcpy( base_buffer, base_buffer + buffer_size, tsize + hsize );
|
||||
size = readblock( infd, buffer + hsize, buffer_size );
|
||||
at_stream_end = ( size < buffer_size );
|
||||
if( size != buffer_size && errno )
|
||||
{ if( shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Read error", errno ); } break; }
|
||||
}
|
||||
delete[] base_buffer;
|
||||
courier.finish( worker_id ); // no more packets to send
|
||||
tmp.num_workers = worker_id;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/* Get from courier the processed and sorted packets, and write their
|
||||
contents to the output file. Drain queue on error.
|
||||
*/
|
||||
void muxer( Packet_courier & courier, const Pretty_print & pp,
|
||||
Shared_retval & shared_retval, const int outfd )
|
||||
{
|
||||
while( true )
|
||||
{
|
||||
Packet * const opacket = courier.deliver_packet();
|
||||
const Packet * const opacket = courier.deliver_packet();
|
||||
if( !opacket ) break; // queue is empty. all workers exited
|
||||
|
||||
const int wr = writeblock( outfd, opacket->data, opacket->size );
|
||||
if( wr != opacket->size )
|
||||
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); }
|
||||
delete[] opacket->data;
|
||||
if( shared_retval() == 0 &&
|
||||
writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Write error", errno ); }
|
||||
delete opacket;
|
||||
}
|
||||
}
|
||||
|
@ -475,8 +562,9 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
|||
} // end namespace
|
||||
|
||||
|
||||
// init the courier, then start the splitter and the workers and,
|
||||
// if not testing, call the muxer.
|
||||
/* Init the courier, then start the splitter and the workers and, if not
|
||||
testing, call the muxer.
|
||||
*/
|
||||
int dec_stream( const unsigned long long cfile_size,
|
||||
const int num_workers, const int infd, const int outfd,
|
||||
const Pretty_print & pp, const int debug_level,
|
||||
|
@ -487,77 +575,76 @@ int dec_stream( const unsigned long long cfile_size,
|
|||
num_workers * in_slots : INT_MAX;
|
||||
in_size = 0;
|
||||
out_size = 0;
|
||||
Packet_courier courier( num_workers, total_in_slots, out_slots );
|
||||
Shared_retval shared_retval;
|
||||
Packet_courier courier( shared_retval, num_workers, total_in_slots, out_slots );
|
||||
|
||||
Splitter_arg splitter_arg;
|
||||
splitter_arg.cfile_size = cfile_size;
|
||||
splitter_arg.courier = &courier;
|
||||
splitter_arg.pp = &pp;
|
||||
splitter_arg.infd = infd;
|
||||
|
||||
pthread_t splitter_thread;
|
||||
int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create splitter thread", errcode ); cleanup_and_fail(); }
|
||||
if( debug_level & 2 ) std::fputs( "decompress stream.\n", stderr );
|
||||
|
||||
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
|
||||
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
|
||||
if( !worker_args || !worker_threads )
|
||||
{ pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
{
|
||||
worker_args[i].courier = &courier;
|
||||
worker_args[i].pp = &pp;
|
||||
worker_args[i].worker_id = i;
|
||||
worker_args[i].ignore_trailing = ignore_trailing;
|
||||
worker_args[i].loose_trailing = loose_trailing;
|
||||
worker_args[i].testing = ( outfd < 0 );
|
||||
errcode = pthread_create( &worker_threads[i], 0, dworker_s, &worker_args[i] );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); }
|
||||
}
|
||||
{ pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
|
||||
|
||||
if( outfd >= 0 ) muxer( courier, pp, outfd );
|
||||
#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
|
||||
const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 );
|
||||
#else
|
||||
const bool nocopy = false;
|
||||
#endif
|
||||
|
||||
for( int i = num_workers - 1; i >= 0; --i )
|
||||
{
|
||||
Splitter_arg splitter_arg;
|
||||
splitter_arg.worker_arg.courier = &courier;
|
||||
splitter_arg.worker_arg.pp = &pp;
|
||||
splitter_arg.worker_arg.shared_retval = &shared_retval;
|
||||
splitter_arg.worker_arg.worker_id = 0;
|
||||
splitter_arg.worker_arg.ignore_trailing = ignore_trailing;
|
||||
splitter_arg.worker_arg.loose_trailing = loose_trailing;
|
||||
splitter_arg.worker_arg.testing = ( outfd < 0 );
|
||||
splitter_arg.worker_arg.nocopy = nocopy;
|
||||
splitter_arg.worker_args = worker_args;
|
||||
splitter_arg.worker_threads = worker_threads;
|
||||
splitter_arg.cfile_size = cfile_size;
|
||||
splitter_arg.infd = infd;
|
||||
splitter_arg.num_workers = num_workers;
|
||||
|
||||
pthread_t splitter_thread;
|
||||
int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create splitter thread", errcode );
|
||||
delete[] worker_threads; delete[] worker_args; return 1; }
|
||||
|
||||
if( outfd >= 0 ) muxer( courier, pp, shared_retval, outfd );
|
||||
|
||||
errcode = pthread_join( splitter_thread, 0 );
|
||||
if( errcode && shared_retval.set_value( 1 ) )
|
||||
show_error( "Can't join splitter thread", errcode );
|
||||
|
||||
for( int i = splitter_arg.num_workers; --i >= 0; )
|
||||
{ // join only the workers started
|
||||
errcode = pthread_join( worker_threads[i], 0 );
|
||||
if( errcode )
|
||||
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); }
|
||||
if( errcode && shared_retval.set_value( 1 ) )
|
||||
show_error( "Can't join worker threads", errcode );
|
||||
}
|
||||
delete[] worker_threads;
|
||||
delete[] worker_args;
|
||||
|
||||
errcode = pthread_join( splitter_thread, 0 );
|
||||
if( errcode )
|
||||
{ show_error( "Can't join splitter thread", errcode ); cleanup_and_fail(); }
|
||||
if( shared_retval() ) return shared_retval(); // some thread found a problem
|
||||
|
||||
if( verbosity >= 2 )
|
||||
{
|
||||
if( verbosity >= 4 ) show_header( splitter_arg.dictionary_size );
|
||||
if( out_size == 0 || in_size == 0 )
|
||||
std::fputs( "no data compressed. ", stderr );
|
||||
else
|
||||
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
|
||||
(double)out_size / in_size,
|
||||
( 100.0 * in_size ) / out_size,
|
||||
100.0 - ( ( 100.0 * in_size ) / out_size ) );
|
||||
if( verbosity >= 3 )
|
||||
std::fprintf( stderr, "decompressed %9llu, compressed %8llu. ",
|
||||
out_size, in_size );
|
||||
}
|
||||
if( verbosity >= 1 ) std::fputs( (outfd < 0) ? "ok\n" : "done\n", stderr );
|
||||
show_results( in_size, out_size, splitter_arg.dictionary_size, outfd < 0 );
|
||||
|
||||
if( debug_level & 1 )
|
||||
{
|
||||
std::fprintf( stderr,
|
||||
"workers started %8u\n"
|
||||
"any worker tried to consume from splitter %8u times\n"
|
||||
"any worker had to wait %8u times\n"
|
||||
"muxer tried to consume from workers %8u times\n"
|
||||
"muxer had to wait %8u times\n",
|
||||
courier.icheck_counter,
|
||||
courier.iwait_counter,
|
||||
courier.ocheck_counter,
|
||||
courier.owait_counter );
|
||||
"any worker had to wait %8u times\n",
|
||||
splitter_arg.num_workers,
|
||||
courier.icheck_counter, courier.iwait_counter );
|
||||
if( outfd >= 0 )
|
||||
std::fprintf( stderr,
|
||||
"muxer tried to consume from workers %8u times\n"
|
||||
"muxer had to wait %8u times\n",
|
||||
courier.ocheck_counter, courier.owait_counter );
|
||||
}
|
||||
|
||||
if( !courier.finished() ) internal_error( "courier not finished." );
|
||||
return 0;
|
||||
|
|
214
decompress.cc
214
decompress.cc
|
@ -1,19 +1,19 @@
|
|||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
/* Plzip - Massively parallel implementation of lzip
|
||||
Copyright (C) 2009 Laszlo Ersek.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
@ -27,7 +27,6 @@
|
|||
#include <cstring>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <pthread.h>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/stat.h>
|
||||
|
@ -37,8 +36,9 @@
|
|||
#include "lzip_index.h"
|
||||
|
||||
|
||||
// This code is based on a patch by Hannes Domani, ssbssa@yahoo.de
|
||||
// to be able to compile plzip under MS Windows (with MINGW compiler).
|
||||
/* This code is based on a patch by Hannes Domani, <ssbssa@yahoo.de> to make
|
||||
possible compiling plzip under MS Windows (with MINGW compiler).
|
||||
*/
|
||||
#if defined(__MSVCRT__) && defined(WITH_MINGW)
|
||||
#include <windows.h>
|
||||
#warning "Parallel I/O is not guaranteed to work on Windows."
|
||||
|
@ -76,9 +76,9 @@ ssize_t pwrite( int fd, const void *buf, size_t count, uint64_t offset )
|
|||
#endif // __MSVCRT__
|
||||
|
||||
|
||||
// Returns the number of bytes really read.
|
||||
// If (returned value < size) and (errno == 0), means EOF was reached.
|
||||
//
|
||||
/* Returns the number of bytes really read.
|
||||
If (returned value < size) and (errno == 0), means EOF was reached.
|
||||
*/
|
||||
int preadblock( const int fd, uint8_t * const buf, const int size,
|
||||
const long long pos )
|
||||
{
|
||||
|
@ -96,9 +96,9 @@ int preadblock( const int fd, uint8_t * const buf, const int size,
|
|||
}
|
||||
|
||||
|
||||
// Returns the number of bytes really written.
|
||||
// If (returned value < size), it is always an error.
|
||||
//
|
||||
/* Returns the number of bytes really written.
|
||||
If (returned value < size), it is always an error.
|
||||
*/
|
||||
int pwriteblock( const int fd, const uint8_t * const buf, const int size,
|
||||
const long long pos )
|
||||
{
|
||||
|
@ -115,18 +115,39 @@ int pwriteblock( const int fd, const uint8_t * const buf, const int size,
|
|||
}
|
||||
|
||||
|
||||
int decompress_read_error( struct LZ_Decoder * const decoder,
|
||||
const Pretty_print & pp, const int worker_id )
|
||||
void decompress_error( struct LZ_Decoder * const decoder,
|
||||
const Pretty_print & pp,
|
||||
Shared_retval & shared_retval, const int worker_id )
|
||||
{
|
||||
const LZ_Errno errcode = LZ_decompress_errno( decoder );
|
||||
const int retval = ( errcode == LZ_header_error || errcode == LZ_data_error ||
|
||||
errcode == LZ_unexpected_eof ) ? 2 : 1;
|
||||
if( !shared_retval.set_value( retval ) ) return;
|
||||
pp();
|
||||
if( verbosity >= 0 )
|
||||
std::fprintf( stderr, "LZ_decompress_read error in worker %d: %s\n",
|
||||
worker_id, LZ_strerror( errcode ) );
|
||||
if( errcode == LZ_header_error || errcode == LZ_unexpected_eof ||
|
||||
errcode == LZ_data_error )
|
||||
return 2;
|
||||
return 1;
|
||||
std::fprintf( stderr, "%s in worker %d\n", LZ_strerror( errcode ),
|
||||
worker_id );
|
||||
}
|
||||
|
||||
|
||||
void show_results( const unsigned long long in_size,
|
||||
const unsigned long long out_size,
|
||||
const unsigned dictionary_size, const bool testing )
|
||||
{
|
||||
if( verbosity >= 2 )
|
||||
{
|
||||
if( verbosity >= 4 ) show_header( dictionary_size );
|
||||
if( out_size == 0 || in_size == 0 )
|
||||
std::fputs( "no data compressed. ", stderr );
|
||||
else
|
||||
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
|
||||
(double)out_size / in_size,
|
||||
( 100.0 * in_size ) / out_size,
|
||||
100.0 - ( ( 100.0 * in_size ) / out_size ) );
|
||||
if( verbosity >= 3 )
|
||||
std::fprintf( stderr, "%9llu out, %8llu in. ", out_size, in_size );
|
||||
}
|
||||
if( verbosity >= 1 ) std::fputs( testing ? "ok\n" : "done\n", stderr );
|
||||
}
|
||||
|
||||
|
||||
|
@ -136,32 +157,38 @@ struct Worker_arg
|
|||
{
|
||||
const Lzip_index * lzip_index;
|
||||
const Pretty_print * pp;
|
||||
Shared_retval * shared_retval;
|
||||
int worker_id;
|
||||
int num_workers;
|
||||
int infd;
|
||||
int outfd;
|
||||
bool nocopy; // avoid copying decompressed data when testing
|
||||
};
|
||||
|
||||
|
||||
// read members from file, decompress their contents, and
|
||||
// write the produced data to file.
|
||||
/* Read members from input file, decompress their contents, and write to
|
||||
output file the data produced.
|
||||
*/
|
||||
extern "C" void * dworker( void * arg )
|
||||
{
|
||||
const Worker_arg & tmp = *(const Worker_arg *)arg;
|
||||
const Lzip_index & lzip_index = *tmp.lzip_index;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
Shared_retval & shared_retval = *tmp.shared_retval;
|
||||
const int worker_id = tmp.worker_id;
|
||||
const int num_workers = tmp.num_workers;
|
||||
const int infd = tmp.infd;
|
||||
const int outfd = tmp.outfd;
|
||||
const bool nocopy = tmp.nocopy;
|
||||
const int buffer_size = 65536;
|
||||
|
||||
uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size];
|
||||
uint8_t * const obuffer = new( std::nothrow ) uint8_t[buffer_size];
|
||||
uint8_t * const obuffer =
|
||||
nocopy ? 0 : new( std::nothrow ) uint8_t[buffer_size];
|
||||
LZ_Decoder * const decoder = LZ_decompress_open();
|
||||
if( !ibuffer || !obuffer || !decoder ||
|
||||
if( !ibuffer || ( !nocopy && !obuffer ) || !decoder ||
|
||||
LZ_decompress_errno( decoder ) != LZ_ok )
|
||||
{ pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
{ if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
|
||||
|
||||
for( long i = worker_id; i < lzip_index.members(); i += num_workers )
|
||||
{
|
||||
|
@ -172,6 +199,7 @@ extern "C" void * dworker( void * arg )
|
|||
|
||||
while( member_rest > 0 )
|
||||
{
|
||||
if( shared_retval() ) goto done; // other worker found a problem
|
||||
while( LZ_decompress_write_size( decoder ) > 0 )
|
||||
{
|
||||
const int size = std::min( LZ_decompress_write_size( decoder ),
|
||||
|
@ -179,7 +207,8 @@ extern "C" void * dworker( void * arg )
|
|||
if( size > 0 )
|
||||
{
|
||||
if( preadblock( infd, ibuffer, size, member_pos ) != size )
|
||||
{ pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
|
||||
{ if( shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Read error", errno ); } goto done; }
|
||||
member_pos += size;
|
||||
member_rest -= size;
|
||||
if( LZ_decompress_write( decoder, ibuffer, size ) != size )
|
||||
|
@ -191,17 +220,18 @@ extern "C" void * dworker( void * arg )
|
|||
{
|
||||
const int rd = LZ_decompress_read( decoder, obuffer, buffer_size );
|
||||
if( rd < 0 )
|
||||
cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) );
|
||||
{ decompress_error( decoder, pp, shared_retval, worker_id );
|
||||
goto done; }
|
||||
if( rd > 0 && outfd >= 0 )
|
||||
{
|
||||
const int wr = pwriteblock( outfd, obuffer, rd, data_pos );
|
||||
if( wr != rd )
|
||||
{
|
||||
pp();
|
||||
if( verbosity >= 0 )
|
||||
std::fprintf( stderr, "Write error in worker %d: %s\n",
|
||||
worker_id, std::strerror( errno ) );
|
||||
cleanup_and_fail();
|
||||
if( shared_retval.set_value( 1 ) ) { pp();
|
||||
if( verbosity >= 0 )
|
||||
std::fprintf( stderr, "Write error in worker %d: %s\n",
|
||||
worker_id, std::strerror( errno ) ); }
|
||||
goto done;
|
||||
}
|
||||
}
|
||||
if( rd > 0 )
|
||||
|
@ -221,98 +251,114 @@ extern "C" void * dworker( void * arg )
|
|||
}
|
||||
show_progress( lzip_index.mblock( i ).size() );
|
||||
}
|
||||
|
||||
delete[] obuffer; delete[] ibuffer;
|
||||
if( LZ_decompress_member_position( decoder ) != 0 )
|
||||
{ pp( "Error, some data remains in decoder." ); cleanup_and_fail(); }
|
||||
if( LZ_decompress_close( decoder ) < 0 )
|
||||
{ pp( "LZ_decompress_close failed." ); cleanup_and_fail(); }
|
||||
done:
|
||||
if( obuffer ) { delete[] obuffer; } delete[] ibuffer;
|
||||
if( LZ_decompress_member_position( decoder ) != 0 &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
pp( "Error, some data remains in decoder." );
|
||||
if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
|
||||
pp( "LZ_decompress_close failed." );
|
||||
return 0;
|
||||
}
|
||||
|
||||
} // end namespace
|
||||
|
||||
|
||||
// start the workers and wait for them to finish.
|
||||
// start the workers and wait for them to finish.
|
||||
int decompress( const unsigned long long cfile_size, int num_workers,
|
||||
const int infd, const int outfd, const Pretty_print & pp,
|
||||
const int debug_level, const int in_slots,
|
||||
const int out_slots, const bool ignore_trailing,
|
||||
const bool loose_trailing, const bool infd_isreg )
|
||||
const bool loose_trailing, const bool infd_isreg,
|
||||
const bool one_to_one )
|
||||
{
|
||||
if( !infd_isreg )
|
||||
return dec_stream( cfile_size, num_workers, infd, outfd, pp, debug_level,
|
||||
in_slots, out_slots, ignore_trailing, loose_trailing );
|
||||
|
||||
const Lzip_index lzip_index( infd, ignore_trailing, loose_trailing );
|
||||
if( lzip_index.retval() == 1 )
|
||||
if( lzip_index.retval() == 1 ) // decompress as stream if seek fails
|
||||
{
|
||||
lseek( infd, 0, SEEK_SET );
|
||||
return dec_stream( cfile_size, num_workers, infd, outfd, pp, debug_level,
|
||||
in_slots, out_slots, ignore_trailing, loose_trailing );
|
||||
}
|
||||
if( lzip_index.retval() != 0 )
|
||||
{ show_file_error( pp.name(), lzip_index.error().c_str() );
|
||||
return lzip_index.retval(); }
|
||||
if( lzip_index.retval() != 0 ) // corrupt or invalid input file
|
||||
{
|
||||
if( lzip_index.bad_magic() )
|
||||
show_file_error( pp.name(), lzip_index.error().c_str() );
|
||||
else pp( lzip_index.error().c_str() );
|
||||
return lzip_index.retval();
|
||||
}
|
||||
|
||||
if( num_workers > lzip_index.members() )
|
||||
num_workers = lzip_index.members();
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, cfile_size, &pp ); // init
|
||||
if( num_workers > lzip_index.members() ) num_workers = lzip_index.members();
|
||||
|
||||
if( outfd >= 0 )
|
||||
{
|
||||
struct stat st;
|
||||
if( fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) ||
|
||||
if( !one_to_one || fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) ||
|
||||
lseek( outfd, 0, SEEK_CUR ) < 0 )
|
||||
return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots,
|
||||
lzip_index );
|
||||
{
|
||||
if( debug_level & 2 ) std::fputs( "decompress file to stdout.\n", stderr );
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, cfile_size, &pp ); // init
|
||||
return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots,
|
||||
lzip_index );
|
||||
}
|
||||
}
|
||||
|
||||
if( debug_level & 2 ) std::fputs( "decompress file to file.\n", stderr );
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, cfile_size, &pp ); // init
|
||||
|
||||
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
|
||||
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
|
||||
if( !worker_args || !worker_threads )
|
||||
{ pp( "Not enough memory." ); cleanup_and_fail(); }
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
{ pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
|
||||
|
||||
#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
|
||||
const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 );
|
||||
#else
|
||||
const bool nocopy = false;
|
||||
#endif
|
||||
|
||||
Shared_retval shared_retval;
|
||||
int i = 0; // number of workers started
|
||||
for( ; i < num_workers; ++i )
|
||||
{
|
||||
worker_args[i].lzip_index = &lzip_index;
|
||||
worker_args[i].pp = &pp;
|
||||
worker_args[i].shared_retval = &shared_retval;
|
||||
worker_args[i].worker_id = i;
|
||||
worker_args[i].num_workers = num_workers;
|
||||
worker_args[i].infd = infd;
|
||||
worker_args[i].outfd = outfd;
|
||||
worker_args[i].nocopy = nocopy;
|
||||
const int errcode =
|
||||
pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); }
|
||||
{ if( shared_retval.set_value( 1 ) )
|
||||
{ show_error( "Can't create worker threads", errcode ); } break; }
|
||||
}
|
||||
|
||||
for( int i = num_workers - 1; i >= 0; --i )
|
||||
while( --i >= 0 )
|
||||
{
|
||||
const int errcode = pthread_join( worker_threads[i], 0 );
|
||||
if( errcode )
|
||||
{ show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); }
|
||||
if( errcode && shared_retval.set_value( 1 ) )
|
||||
show_error( "Can't join worker threads", errcode );
|
||||
}
|
||||
delete[] worker_threads;
|
||||
delete[] worker_args;
|
||||
|
||||
if( verbosity >= 2 )
|
||||
{
|
||||
if( verbosity >= 4 ) show_header( lzip_index.dictionary_size( 0 ) );
|
||||
const unsigned long long in_size = lzip_index.cdata_size();
|
||||
const unsigned long long out_size = lzip_index.udata_size();
|
||||
if( out_size == 0 || in_size == 0 )
|
||||
std::fputs( "no data compressed. ", stderr );
|
||||
else
|
||||
std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
|
||||
(double)out_size / in_size,
|
||||
( 100.0 * in_size ) / out_size,
|
||||
100.0 - ( ( 100.0 * in_size ) / out_size ) );
|
||||
if( verbosity >= 3 )
|
||||
std::fprintf( stderr, "decompressed %9llu, compressed %8llu. ",
|
||||
out_size, in_size );
|
||||
}
|
||||
if( verbosity >= 1 ) std::fputs( (outfd < 0) ? "ok\n" : "done\n", stderr );
|
||||
if( shared_retval() ) return shared_retval(); // some thread found a problem
|
||||
|
||||
if( verbosity >= 1 )
|
||||
show_results( lzip_index.cdata_size(), lzip_index.udata_size(),
|
||||
lzip_index.dictionary_size(), outfd < 0 );
|
||||
|
||||
if( debug_level & 1 )
|
||||
std::fprintf( stderr,
|
||||
"workers started %8u\n", num_workers );
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
|
50
doc/plzip.1
50
doc/plzip.1
|
@ -1,5 +1,5 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH PLZIP "1" "January 2019" "plzip 1.8" "User Commands"
|
||||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16.
|
||||
.TH PLZIP "1" "January 2021" "plzip 1.9" "User Commands"
|
||||
.SH NAME
|
||||
plzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
|
@ -7,22 +7,24 @@ plzip \- reduces the size of files
|
|||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Plzip is a massively parallel (multi\-threaded) implementation of lzip, fully
|
||||
compatible with lzip 1.4 or newer. Plzip uses the lzlib compression library.
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
.PP
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip \fB\-0\fR)
|
||||
or compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2
|
||||
from a data recovery perspective. Lzip has been designed, written and
|
||||
tested with great care to replace gzip and bzip2 as the standard
|
||||
general\-purpose compressed format for unix\-like systems.
|
||||
Lzip is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov
|
||||
chain\-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) or
|
||||
compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general\-purpose
|
||||
compressed format for unix\-like systems.
|
||||
.PP
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip.
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
to 2 percent larger compressed files). Note that the number of usable
|
||||
threads is limited by file size; on files larger than a few GB plzip can use
|
||||
hundreds of processors, but on files of only a few MB plzip is no faster
|
||||
than lzip.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-h\fR, \fB\-\-help\fR
|
||||
|
@ -62,7 +64,7 @@ set match length limit in bytes [36]
|
|||
set number of (de)compression threads [2]
|
||||
.TP
|
||||
\fB\-o\fR, \fB\-\-output=\fR<file>
|
||||
if reading standard input, write to <file>
|
||||
write to <file>, keep input files
|
||||
.TP
|
||||
\fB\-q\fR, \fB\-\-quiet\fR
|
||||
suppress all messages
|
||||
|
@ -93,6 +95,9 @@ number of 1 MiB input packets buffered [4]
|
|||
.TP
|
||||
\fB\-\-out\-slots=\fR<n>
|
||||
number of 1 MiB output packets buffered [64]
|
||||
.TP
|
||||
\fB\-\-check\-lib\fR
|
||||
compare version of lzlib.h with liblz.{a,so}
|
||||
.PP
|
||||
If no file names are given, or if a file is '\-', plzip compresses or
|
||||
decompresses from standard input to standard output.
|
||||
|
@ -103,8 +108,11 @@ to 2^29 bytes.
|
|||
.PP
|
||||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR
|
||||
options directly to achieve optimal performance.
|
||||
etc, you may need to use the options \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR
|
||||
directly to achieve optimal performance.
|
||||
.PP
|
||||
To extract all the files from archive 'foo.tar.lz', use the commands
|
||||
\&'tar \fB\-xf\fR foo.tar.lz' or 'plzip \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'.
|
||||
.PP
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
|
@ -117,8 +125,8 @@ Plzip home page: http://www.nongnu.org/lzip/plzip.html
|
|||
.SH COPYRIGHT
|
||||
Copyright \(co 2009 Laszlo Ersek.
|
||||
.br
|
||||
Copyright \(co 2019 Antonio Diaz Diaz.
|
||||
Using lzlib 1.11
|
||||
Copyright \(co 2021 Antonio Diaz Diaz.
|
||||
Using lzlib 1.12
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
682
doc/plzip.info
682
doc/plzip.info
|
@ -11,7 +11,7 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Plzip Manual
|
||||
************
|
||||
|
||||
This manual is for Plzip (version 1.8, 5 January 2019).
|
||||
This manual is for Plzip (version 1.9, 3 January 2021).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -28,10 +28,10 @@ This manual is for Plzip (version 1.8, 5 January 2019).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2009-2019 Antonio Diaz Diaz.
|
||||
Copyright (C) 2009-2021 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to
|
||||
copy, distribute and modify it.
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
||||
|
||||
File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
||||
|
@ -39,88 +39,89 @@ File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip,
|
||||
fully compatible with lzip 1.4 or newer. Plzip uses the lzlib
|
||||
compression library.
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip, fully
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to
|
||||
the one of gzip or bzip2. Lzip can compress about as fast as gzip
|
||||
(lzip -0) or compress most files more than bzip2 (lzip -9).
|
||||
Decompression speed is intermediate between gzip and bzip2. Lzip is
|
||||
better than gzip and bzip2 from a data recovery perspective. Lzip has
|
||||
been designed, written and tested with great care to replace gzip and
|
||||
bzip2 as the standard general-purpose compressed format for unix-like
|
||||
systems.
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip (lzip -0) or
|
||||
compress most files more than bzip2 (lzip -9). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general-purpose
|
||||
compressed format for unix-like systems.
|
||||
|
||||
Plzip can compress/decompress large files on multiprocessor machines
|
||||
much faster than lzip, at the cost of a slightly reduced compression
|
||||
ratio (0.4 to 2 percent larger compressed files). Note that the number
|
||||
of usable threads is limited by file size; on files larger than a few GB
|
||||
plzip can use hundreds of processors, but on files of only a few MB
|
||||
plzip is no faster than lzip. *Note Minimum file sizes::.
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
to 2 percent larger compressed files). Note that the number of usable
|
||||
threads is limited by file size; on files larger than a few GB plzip can use
|
||||
hundreds of processors, but on files of only a few MB plzip is no faster
|
||||
than lzip. *Note Minimum file sizes::.
|
||||
|
||||
For creation and manipulation of compressed tar archives tarlz can be
|
||||
more efficient than using tar and plzip because tarlz is able to keep the
|
||||
alignment between tar members and lzip members. *Note tarlz manual:
|
||||
(tarlz)Top.
|
||||
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
archiving, taking into account both data integrity and decoder availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file. *Note Data safety:
|
||||
(lziprecover)Data safety.
|
||||
recovery means. The program lziprecover can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files, and
|
||||
provides data recovery capabilities, including error-checked merging
|
||||
of damaged copies of a file. *Note Data safety: (lziprecover)Data
|
||||
safety.
|
||||
|
||||
* The lzip format is as simple as possible (but not simpler). The
|
||||
lzip manual provides the source code of a simple decompressor
|
||||
along with a detailed explanation of how it works, so that with
|
||||
the only help of the lzip manual it would be possible for a
|
||||
digital archaeologist to extract the data from a lzip file long
|
||||
after quantum computers eventually render LZMA obsolete.
|
||||
* The lzip format is as simple as possible (but not simpler). The lzip
|
||||
manual provides the source code of a simple decompressor along with a
|
||||
detailed explanation of how it works, so that with the only help of the
|
||||
lzip manual it would be possible for a digital archaeologist to extract
|
||||
the data from a lzip file long after quantum computers eventually
|
||||
render LZMA obsolete.
|
||||
|
||||
* Additionally the lzip reference implementation is copylefted, which
|
||||
guarantees that it will remain free forever.
|
||||
|
||||
A nice feature of the lzip format is that a corrupt byte is easier to
|
||||
repair the nearer it is from the beginning of the file. Therefore, with
|
||||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
repair the nearer it is from the beginning of the file. Therefore, with the
|
||||
help of lziprecover, losing an entire archive just because of a corrupt
|
||||
byte near the beginning is a thing of the past.
|
||||
|
||||
Plzip uses the same well-defined exit status values used by lzip,
|
||||
which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs
|
||||
like tar or zutils.
|
||||
Plzip uses the same well-defined exit status values used by lzip, which
|
||||
makes it safer than compressors returning ambiguous warning values (like
|
||||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Plzip will automatically use for each file the largest dictionary
|
||||
size that does not exceed neither the file size nor the limit given.
|
||||
Keep in mind that the decompression memory requirement is affected at
|
||||
compression time by the choice of dictionary size limit. *Note Memory
|
||||
requirements::.
|
||||
Plzip will automatically use for each file the largest dictionary size
|
||||
that does not exceed neither the file size nor the limit given. Keep in
|
||||
mind that the decompression memory requirement is affected at compression
|
||||
time by the choice of dictionary size limit. *Note Memory requirements::.
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz".
|
||||
When decompressing, plzip attempts to guess the name for the
|
||||
decompressed file from that of the compressed file as follows:
|
||||
with a compressed version of itself, with the name "original_name.lz". When
|
||||
decompressing, plzip attempts to guess the name for the decompressed file
|
||||
from that of the compressed file as follows:
|
||||
|
||||
filename.lz becomes filename
|
||||
filename.tlz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
(De)compressing a file is much like copying or moving it; therefore
|
||||
plzip preserves the access and modification dates, permissions, and,
|
||||
when possible, ownership of the file just as 'cp -p' does. (If the user
|
||||
ID or the group ID can't be duplicated, the file permission bits
|
||||
S_ISUID and S_ISGID are cleared).
|
||||
(De)compressing a file is much like copying or moving it; therefore plzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as 'cp -p' does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
|
||||
Plzip is able to read from some types of non regular files if the
|
||||
'--stdout' option is specified.
|
||||
Plzip is able to read from some types of non-regular files if either the
|
||||
option '-c' or the option '-o' is specified.
|
||||
|
||||
If no file names are specified, plzip compresses (or decompresses)
|
||||
from standard input to standard output. In this case, plzip will
|
||||
decline to write compressed output to a terminal, as this would be
|
||||
entirely incomprehensible and therefore pointless.
|
||||
Plzip will refuse to read compressed data from a terminal or write
|
||||
compressed data to a terminal, as this would be entirely incomprehensible
|
||||
and might leave the terminal in an abnormal state.
|
||||
|
||||
Plzip will correctly decompress a file which is the concatenation of
|
||||
two or more compressed files. The result is the concatenation of the
|
||||
Plzip will correctly decompress a file which is the concatenation of two
|
||||
or more compressed files. The result is the concatenation of the
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
|
||||
|
@ -135,41 +136,40 @@ The output of plzip looks like this:
|
|||
plzip -v foo
|
||||
foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
|
||||
|
||||
plzip -tvv foo.lz
|
||||
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. ok
|
||||
plzip -tvvv foo.lz
|
||||
foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok
|
||||
|
||||
The meaning of each field is as follows:
|
||||
|
||||
'N:1'
|
||||
The compression ratio (uncompressed_size / compressed_size), shown
|
||||
as N to 1.
|
||||
The compression ratio (uncompressed_size / compressed_size), shown as
|
||||
N to 1.
|
||||
|
||||
'ratio'
|
||||
The inverse compression ratio
|
||||
(compressed_size / uncompressed_size), shown as a percentage. A
|
||||
decimal ratio is easily obtained by moving the decimal point two
|
||||
places to the left; 14.98% = 0.1498.
|
||||
The inverse compression ratio (compressed_size / uncompressed_size),
|
||||
shown as a percentage. A decimal ratio is easily obtained by moving the
|
||||
decimal point two places to the left; 14.98% = 0.1498.
|
||||
|
||||
'saved'
|
||||
The space saved by compression (1 - ratio), shown as a percentage.
|
||||
|
||||
'in'
|
||||
The size of the uncompressed data. When decompressing or testing,
|
||||
it is shown as 'decompressed'. Note that plzip always prints the
|
||||
uncompressed size before the compressed size when compressing,
|
||||
decompressing, testing or listing.
|
||||
Size of the input data. This is the uncompressed size when
|
||||
compressing, or the compressed size when decompressing or testing.
|
||||
Note that plzip always prints the uncompressed size before the
|
||||
compressed size when compressing, decompressing, testing, or listing.
|
||||
|
||||
'out'
|
||||
The size of the compressed data. When decompressing or testing, it
|
||||
is shown as 'compressed'.
|
||||
Size of the output data. This is the compressed size when compressing,
|
||||
or the decompressed size when decompressing or testing.
|
||||
|
||||
|
||||
When decompressing or testing at verbosity level 4 (-vvvv), the
|
||||
dictionary size used to compress the file is also shown.
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may
|
||||
never have been compressed. Decompressed is used to refer to data which
|
||||
have undergone the process of decompression.
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
|
||||
have been compressed. Decompressed is used to refer to data which have
|
||||
undergone the process of decompression.
|
||||
|
||||
|
||||
File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top
|
||||
|
@ -181,11 +181,13 @@ The format for running plzip is:
|
|||
|
||||
plzip [OPTIONS] [FILES]
|
||||
|
||||
'-' used as a FILE argument means standard input. It can be mixed with
|
||||
other FILES and is read just once, the first time it appears in the
|
||||
command line.
|
||||
If no file names are specified, plzip compresses (or decompresses) from
|
||||
standard input to standard output. A hyphen '-' used as a FILE argument
|
||||
means standard input. It can be mixed with other FILES and is read just
|
||||
once, the first time it appears in the command line.
|
||||
|
||||
plzip supports the following options:
|
||||
plzip supports the following options: *Note Argument syntax:
|
||||
(arg_parser)Argument syntax.
|
||||
|
||||
'-h'
|
||||
'--help'
|
||||
|
@ -199,32 +201,33 @@ command line.
|
|||
'-a'
|
||||
'--trailing-error'
|
||||
Exit with error status 2 if any remaining input is detected after
|
||||
decompressing the last member. Such remaining input is usually
|
||||
trailing garbage that can be safely ignored. *Note
|
||||
concat-example::.
|
||||
decompressing the last member. Such remaining input is usually trailing
|
||||
garbage that can be safely ignored. *Note concat-example::.
|
||||
|
||||
'-B BYTES'
|
||||
'--data-size=BYTES'
|
||||
When compressing, set the size of the input data blocks in bytes.
|
||||
The input file will be divided in chunks of this size before
|
||||
compression is performed. Valid values range from 8 KiB to 1 GiB.
|
||||
Default value is two times the dictionary size, except for option
|
||||
'-0' where it defaults to 1 MiB. Plzip will reduce the dictionary
|
||||
size if it is larger than the chosen data size.
|
||||
When compressing, set the size of the input data blocks in bytes. The
|
||||
input file will be divided in chunks of this size before compression is
|
||||
performed. Valid values range from 8 KiB to 1 GiB. Default value is
|
||||
two times the dictionary size, except for option '-0' where it
|
||||
defaults to 1 MiB. Plzip will reduce the dictionary size if it is
|
||||
larger than the data size specified. *Note Minimum file sizes::.
|
||||
|
||||
'-c'
|
||||
'--stdout'
|
||||
Compress or decompress to standard output; keep input files
|
||||
unchanged. If compressing several files, each file is compressed
|
||||
independently. This option is needed when reading from a named
|
||||
pipe (fifo) or from a device.
|
||||
Compress or decompress to standard output; keep input files unchanged.
|
||||
If compressing several files, each file is compressed independently.
|
||||
This option (or '-o') is needed when reading from a named pipe (fifo)
|
||||
or from a device. Use 'lziprecover -cd -i' to recover as much of the
|
||||
decompressed data as possible when decompressing a corrupt file. '-c'
|
||||
overrides '-o'. '-c' has no effect when testing or listing.
|
||||
|
||||
'-d'
|
||||
'--decompress'
|
||||
Decompress the specified files. If a file does not exist or can't
|
||||
be opened, plzip continues decompressing the rest of the files. If
|
||||
a file fails to decompress, or is a terminal, plzip exits
|
||||
immediately without decompressing the rest of the files.
|
||||
Decompress the files specified. If a file does not exist or can't be
|
||||
opened, plzip continues decompressing the rest of the files. If a file
|
||||
fails to decompress, or is a terminal, plzip exits immediately without
|
||||
decompressing the rest of the files.
|
||||
|
||||
'-f'
|
||||
'--force'
|
||||
|
@ -232,59 +235,69 @@ command line.
|
|||
|
||||
'-F'
|
||||
'--recompress'
|
||||
When compressing, force re-compression of files whose name already
|
||||
has the '.lz' or '.tlz' suffix.
|
||||
When compressing, force re-compression of files whose name already has
|
||||
the '.lz' or '.tlz' suffix.
|
||||
|
||||
'-k'
|
||||
'--keep'
|
||||
Keep (don't delete) input files during compression or
|
||||
decompression.
|
||||
Keep (don't delete) input files during compression or decompression.
|
||||
|
||||
'-l'
|
||||
'--list'
|
||||
Print the uncompressed size, compressed size and percentage saved
|
||||
of the specified files. Trailing data are ignored. The values
|
||||
produced are correct even for multimember files. If more than one
|
||||
file is given, a final line containing the cumulative sizes is
|
||||
printed. With '-v', the dictionary size, the number of members in
|
||||
the file, and the amount of trailing data (if any) are also
|
||||
printed. With '-vv', the positions and sizes of each member in
|
||||
multimember files are also printed. '-lq' can be used to verify
|
||||
quickly (without decompressing) the structural integrity of the
|
||||
specified files. (Use '--test' to verify the data integrity).
|
||||
'-alq' additionally verifies that none of the specified files
|
||||
contain trailing data.
|
||||
Print the uncompressed size, compressed size, and percentage saved of
|
||||
the files specified. Trailing data are ignored. The values produced
|
||||
are correct even for multimember files. If more than one file is
|
||||
given, a final line containing the cumulative sizes is printed. With
|
||||
'-v', the dictionary size, the number of members in the file, and the
|
||||
amount of trailing data (if any) are also printed. With '-vv', the
|
||||
positions and sizes of each member in multimember files are also
|
||||
printed.
|
||||
|
||||
'-lq' can be used to verify quickly (without decompressing) the
|
||||
structural integrity of the files specified. (Use '--test' to verify
|
||||
the data integrity). '-alq' additionally verifies that none of the
|
||||
files specified contain trailing data.
|
||||
|
||||
'-m BYTES'
|
||||
'--match-length=BYTES'
|
||||
When compressing, set the match length limit in bytes. After a
|
||||
match this long is found, the search is finished. Valid values
|
||||
range from 5 to 273. Larger values usually give better compression
|
||||
ratios but longer compression times.
|
||||
When compressing, set the match length limit in bytes. After a match
|
||||
this long is found, the search is finished. Valid values range from 5
|
||||
to 273. Larger values usually give better compression ratios but longer
|
||||
compression times.
|
||||
|
||||
'-n N'
|
||||
'--threads=N'
|
||||
Set the number of worker threads, overriding the system's default.
|
||||
Valid values range from 1 to "as many as your system can support".
|
||||
If this option is not used, plzip tries to detect the number of
|
||||
processors in the system and use it as default value. When
|
||||
compressing on a 32 bit system, plzip tries to limit the memory
|
||||
use to under 2.22 GiB (4 worker threads at level -9) by reducing
|
||||
the number of threads below the system's default. 'plzip --help'
|
||||
shows the system's default value.
|
||||
Set the maximum number of worker threads, overriding the system's
|
||||
default. Valid values range from 1 to "as many as your system can
|
||||
support". If this option is not used, plzip tries to detect the number
|
||||
of processors in the system and use it as default value. When
|
||||
compressing on a 32 bit system, plzip tries to limit the memory use to
|
||||
under 2.22 GiB (4 worker threads at level -9) by reducing the number
|
||||
of threads below the system's default. 'plzip --help' shows the
|
||||
system's default value.
|
||||
|
||||
Note that the number of usable threads is limited to
|
||||
ceil( file_size / data_size ) during compression (*note Minimum
|
||||
file sizes::), and to the number of members in the input during
|
||||
decompression.
|
||||
Plzip starts the number of threads required by each file without
|
||||
exceeding the value specified. Note that the number of usable threads
|
||||
is limited to ceil( file_size / data_size ) during compression (*note
|
||||
Minimum file sizes::), and to the number of members in the input
|
||||
during decompression. You can find the number of members in a lzip
|
||||
file by running 'plzip -lv file.lz'.
|
||||
|
||||
'-o FILE'
|
||||
'--output=FILE'
|
||||
When reading from standard input and '--stdout' has not been
|
||||
specified, use 'FILE' as the virtual name of the uncompressed
|
||||
file. This produces a file named 'FILE' when decompressing, or a
|
||||
file named 'FILE.lz' when compressing. A second '.lz' extension is
|
||||
not added if 'FILE' already ends in '.lz' or '.tlz'.
|
||||
If '-c' has not been also specified, write the (de)compressed output to
|
||||
FILE; keep input files unchanged. If compressing several files, each
|
||||
file is compressed independently. This option (or '-c') is needed when
|
||||
reading from a named pipe (fifo) or from a device. '-o -' is
|
||||
equivalent to '-c'. '-o' has no effect when testing or listing.
|
||||
|
||||
In order to keep backward compatibility with plzip versions prior to
|
||||
1.9, when compressing from standard input and no other file names are
|
||||
given, the extension '.lz' is appended to FILE unless it already ends
|
||||
in '.lz' or '.tlz'. This feature will be removed in a future version
|
||||
of plzip. Meanwhile, redirection may be used instead of '-o' to write
|
||||
the compressed output to a file without the extension '.lz' in its
|
||||
name: 'plzip < file > foo'.
|
||||
|
||||
'-q'
|
||||
'--quiet'
|
||||
|
@ -292,30 +305,28 @@ command line.
|
|||
|
||||
'-s BYTES'
|
||||
'--dictionary-size=BYTES'
|
||||
When compressing, set the dictionary size limit in bytes. Plzip
|
||||
will use for each file the largest dictionary size that does not
|
||||
exceed neither the file size nor this limit. Valid values range
|
||||
from 4 KiB to 512 MiB. Values 12 to 29 are interpreted as powers
|
||||
of two, meaning 2^12 to 2^29 bytes. Dictionary sizes are quantized
|
||||
so that they can be coded in just one byte (*note
|
||||
coded-dict-size::). If the specified size does not match one of
|
||||
the valid sizes, it will be rounded upwards by adding up to
|
||||
(BYTES / 8) to it.
|
||||
When compressing, set the dictionary size limit in bytes. Plzip will
|
||||
use for each file the largest dictionary size that does not exceed
|
||||
neither the file size nor this limit. Valid values range from 4 KiB to
|
||||
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
|
||||
coded in just one byte (*note coded-dict-size::). If the size specified
|
||||
does not match one of the valid sizes, it will be rounded upwards by
|
||||
adding up to (BYTES / 8) to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as
|
||||
large as possible, but keep in mind that the decompression memory
|
||||
requirement is affected at compression time by the choice of
|
||||
dictionary size limit.
|
||||
For maximum compression you should use a dictionary size limit as large
|
||||
as possible, but keep in mind that the decompression memory requirement
|
||||
is affected at compression time by the choice of dictionary size limit.
|
||||
|
||||
'-t'
|
||||
'--test'
|
||||
Check integrity of the specified files, but don't decompress them.
|
||||
This really performs a trial decompression and throws away the
|
||||
result. Use it together with '-v' to see information about the
|
||||
files. If a file does not exist, can't be opened, or is a
|
||||
terminal, plzip continues checking the rest of the files. If a
|
||||
file fails the test, plzip may be unable to check the rest of the
|
||||
files.
|
||||
Check integrity of the files specified, but don't decompress them. This
|
||||
really performs a trial decompression and throws away the result. Use
|
||||
it together with '-v' to see information about the files. If a file
|
||||
fails the test, does not exist, can't be opened, or is a terminal,
|
||||
plzip continues checking the rest of the files. A final diagnostic is
|
||||
shown at verbosity level 1 or higher if any file fails the test when
|
||||
testing multiple files.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
|
@ -323,26 +334,26 @@ command line.
|
|||
When compressing, show the compression ratio and size for each file
|
||||
processed.
|
||||
When decompressing or testing, further -v's (up to 4) increase the
|
||||
verbosity level, showing status, compression ratio, dictionary
|
||||
size, decompressed size, and compressed size.
|
||||
Two or more '-v' options show the progress of (de)compression,
|
||||
except for single-member files.
|
||||
verbosity level, showing status, compression ratio, dictionary size,
|
||||
decompressed size, and compressed size.
|
||||
Two or more '-v' options show the progress of (de)compression, except
|
||||
for single-member files.
|
||||
|
||||
'-0 .. -9'
|
||||
Compression level. Set the compression parameters (dictionary size
|
||||
and match length limit) as shown in the table below. The default
|
||||
compression level is '-6', equivalent to '-s8MiB -m36'. Note that
|
||||
'-9' can be much slower than '-0'. These options have no effect
|
||||
when decompressing, testing or listing.
|
||||
Compression level. Set the compression parameters (dictionary size and
|
||||
match length limit) as shown in the table below. The default
|
||||
compression level is '-6', equivalent to '-s8MiB -m36'. Note that '-9'
|
||||
can be much slower than '-0'. These options have no effect when
|
||||
decompressing, testing, or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a
|
||||
linear scale optimal for all files. If your files are large, very
|
||||
repetitive, etc, you may need to use the '--dictionary-size' and
|
||||
'--match-length' options directly to achieve optimal performance.
|
||||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the options '--dictionary-size' and
|
||||
'--match-length' directly to achieve optimal performance.
|
||||
|
||||
If several compression levels or '-s' or '-m' options are given,
|
||||
the last setting is used. For example '-9 -s64MiB' is equivalent
|
||||
to '-s64MiB -m273'
|
||||
If several compression levels or '-s' or '-m' options are given, the
|
||||
last setting is used. For example '-9 -s64MiB' is equivalent to
|
||||
'-s64MiB -m273'
|
||||
|
||||
Level Dictionary size (-s) Match length limit (-m)
|
||||
-0 64 KiB 16 bytes
|
||||
|
@ -361,23 +372,33 @@ command line.
|
|||
Aliases for GNU gzip compatibility.
|
||||
|
||||
'--loose-trailing'
|
||||
When decompressing, testing or listing, allow trailing data whose
|
||||
first bytes are so similar to the magic bytes of a lzip header
|
||||
that they can be confused with a corrupt header. Use this option
|
||||
if a file triggers a "corrupt header" error and the cause is not
|
||||
indeed a corrupt header.
|
||||
When decompressing, testing, or listing, allow trailing data whose
|
||||
first bytes are so similar to the magic bytes of a lzip header that
|
||||
they can be confused with a corrupt header. Use this option if a file
|
||||
triggers a "corrupt header" error and the cause is not indeed a
|
||||
corrupt header.
|
||||
|
||||
'--in-slots=N'
|
||||
Number of 1 MiB input packets buffered per worker thread when
|
||||
decompressing from non-seekable input. Increasing the number of
|
||||
packets may increase decompression speed, but requires more
|
||||
memory. Valid values range from 1 to 64. The default value is 4.
|
||||
decompressing from non-seekable input. Increasing the number of packets
|
||||
may increase decompression speed, but requires more memory. Valid
|
||||
values range from 1 to 64. The default value is 4.
|
||||
|
||||
'--out-slots=N'
|
||||
Number of 1 MiB output packets buffered per worker thread when
|
||||
decompressing to non-seekable output. Increasing the number of
|
||||
packets may increase decompression speed, but requires more
|
||||
memory. Valid values range from 1 to 1024. The default value is 64.
|
||||
decompressing to non-seekable output. Increasing the number of packets
|
||||
may increase decompression speed, but requires more memory. Valid
|
||||
values range from 1 to 1024. The default value is 64.
|
||||
|
||||
'--check-lib'
|
||||
Compare the version of lzlib used to compile plzip with the version
|
||||
actually being used at run time and exit. Report any differences
|
||||
found. Exit with error status 1 if differences are found. A mismatch
|
||||
may indicate that lzlib is not correctly installed or that a different
|
||||
version of lzlib has been installed after compiling plzip.
|
||||
'plzip -v --check-lib' shows the version of lzlib being used and the
|
||||
value of 'LZ_API_VERSION' (if defined). *Note Library version:
|
||||
(lzlib)Library version.
|
||||
|
||||
|
||||
Numbers given as arguments to options may be followed by a multiplier
|
||||
|
@ -396,36 +417,36 @@ Z zettabyte (10^21) | Zi zebibyte (2^70)
|
|||
Y yottabyte (10^24) | Yi yobibyte (2^80)
|
||||
|
||||
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
invalid input file, 3 for an internal consistency error (eg, bug) which
|
||||
caused plzip to panic.
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file not
|
||||
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
|
||||
input file, 3 for an internal consistency error (eg, bug) which caused
|
||||
plzip to panic.
|
||||
|
||||
|
||||
File: plzip.info, Node: Program design, Next: File format, Prev: Invoking plzip, Up: Top
|
||||
|
||||
4 Program design
|
||||
****************
|
||||
4 Internal structure of plzip
|
||||
*****************************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
compresses as many chunks simultaneously as worker threads are chosen,
|
||||
creating a multimember compressed file.
|
||||
When compressing, plzip divides the input file into chunks and compresses as
|
||||
many chunks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed file.
|
||||
|
||||
When decompressing, plzip decompresses as many members
|
||||
simultaneously as worker threads are chosen. Files that were compressed
|
||||
with lzip will not be decompressed faster than using lzip (unless the
|
||||
'-b' option was used) because lzip usually produces single-member
|
||||
files, which can't be decompressed in parallel.
|
||||
When decompressing, plzip decompresses as many members simultaneously as
|
||||
worker threads are chosen. Files that were compressed with lzip will not be
|
||||
decompressed faster than using lzip (unless the option '-b' was used)
|
||||
because lzip usually produces single-member files, which can't be
|
||||
decompressed in parallel.
|
||||
|
||||
For each input file, a splitter thread and several worker threads are
|
||||
created, acting the main thread as muxer (multiplexer) thread. A "packet
|
||||
courier" takes care of data transfers among threads and limits the
|
||||
maximum number of data blocks (packets) being processed simultaneously.
|
||||
courier" takes care of data transfers among threads and limits the maximum
|
||||
number of data blocks (packets) being processed simultaneously.
|
||||
|
||||
The splitter reads data blocks from the input file, and distributes
|
||||
them to the workers. The workers (de)compress the blocks received from
|
||||
the splitter. The muxer collects processed packets from the workers, and
|
||||
writes them to the output file.
|
||||
The splitter reads data blocks from the input file, and distributes them
|
||||
to the workers. The workers (de)compress the blocks received from the
|
||||
splitter. The muxer collects processed packets from the workers, and writes
|
||||
them to the output file.
|
||||
|
||||
,------------,
|
||||
,-->| worker 0 |--,
|
||||
|
@ -438,13 +459,12 @@ writes them to the output file.
|
|||
`-->| worker N-1 |--'
|
||||
`------------'
|
||||
|
||||
When decompressing from a regular file, the splitter is removed and
|
||||
the workers read directly from the input file. If the output file is
|
||||
also a regular file, the muxer is also removed and the workers write
|
||||
directly to the output file. With these optimizations, the use of RAM
|
||||
is greatly reduced and the decompression speed of large files with many
|
||||
members is only limited by the number of processors available and by
|
||||
I/O speed.
|
||||
When decompressing from a regular file, the splitter is removed and the
|
||||
workers read directly from the input file. If the output file is also a
|
||||
regular file, the muxer is also removed and the workers write directly to
|
||||
the output file. With these optimizations, the use of RAM is greatly
|
||||
reduced and the decompression speed of large files with many members is
|
||||
only limited by the number of processors available and by I/O speed.
|
||||
|
||||
|
||||
File: plzip.info, Node: File format, Next: Memory requirements, Prev: Program design, Up: Top
|
||||
|
@ -458,11 +478,13 @@ when there is no longer anything to take away.
|
|||
|
||||
|
||||
In the diagram below, a box like this:
|
||||
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
@ -471,10 +493,11 @@ when there is no longer anything to take away.
|
|||
|
||||
|
||||
A lzip file consists of a series of "members" (compressed data sets).
|
||||
The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them.
|
||||
The members simply appear one after another in the file, with no additional
|
||||
information before, between, or after them.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
|
@ -482,17 +505,16 @@ additional information before, between, or after them.
|
|||
All multibyte values are stored in little endian order.
|
||||
|
||||
'ID string (the "magic" bytes)'
|
||||
A four byte string, identifying the lzip format, with the value
|
||||
"LZIP" (0x4C, 0x5A, 0x49, 0x50).
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
'VN (version number, 1 byte)'
|
||||
Just in case something needs to be modified in the future. 1 for
|
||||
now.
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
'DS (coded dictionary size, 1 byte)'
|
||||
The dictionary size is calculated by taking a power of 2 (the base
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of
|
||||
the base size.
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of the
|
||||
base size.
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.
|
||||
|
@ -501,20 +523,20 @@ additional information before, between, or after them.
|
|||
|
||||
'LZMA stream'
|
||||
The LZMA stream, finished by an end of stream marker. Uses default
|
||||
values for encoder properties. *Note Stream format: (lzip)Stream
|
||||
values for encoder properties. *Note Stream format: (lzip)Stream
|
||||
format, for a complete description.
|
||||
|
||||
'CRC32 (4 bytes)'
|
||||
CRC of the uncompressed original data.
|
||||
Cyclic Redundancy Check (CRC) of the uncompressed original data.
|
||||
|
||||
'Data size (8 bytes)'
|
||||
Size of the uncompressed original data.
|
||||
|
||||
'Member size (8 bytes)'
|
||||
Total size of the member, including header and trailer. This field
|
||||
acts as a distributed index, allows the verification of stream
|
||||
integrity, and facilitates safe recovery of undamaged members from
|
||||
multimember files.
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, allows the verification of stream integrity,
|
||||
and facilitates safe recovery of undamaged members from multimember
|
||||
files.
|
||||
|
||||
|
||||
|
||||
|
@ -526,20 +548,20 @@ File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev:
|
|||
The amount of memory required *per worker thread* for decompression or
|
||||
testing is approximately the following:
|
||||
|
||||
* For decompression of a regular (seekable) file to another regular
|
||||
file, or for testing of a regular file; the dictionary size.
|
||||
* For decompression of a regular (seekable) file to another regular file,
|
||||
or for testing of a regular file; the dictionary size.
|
||||
|
||||
* For testing of a non-seekable file or of standard input; the
|
||||
dictionary size plus 1 MiB plus up to the number of 1 MiB input
|
||||
packets buffered (4 by default).
|
||||
* For testing of a non-seekable file or of standard input; the dictionary
|
||||
size plus 1 MiB plus up to the number of 1 MiB input packets buffered
|
||||
(4 by default).
|
||||
|
||||
* For decompression of a regular file to a non-seekable file or to
|
||||
standard output; the dictionary size plus up to the number of 1 MiB
|
||||
output packets buffered (64 by default).
|
||||
|
||||
* For decompression of a non-seekable file or of standard input; the
|
||||
dictionary size plus 1 MiB plus up to the number of 1 MiB input
|
||||
and output packets buffered (68 by default).
|
||||
dictionary size plus 1 MiB plus up to the number of 1 MiB input and
|
||||
output packets buffered (68 by default).
|
||||
|
||||
The amount of memory required *per worker thread* for compression is
|
||||
approximately the following:
|
||||
|
@ -550,9 +572,8 @@ approximately the following:
|
|||
* For compression at other levels; 11 times the dictionary size plus
|
||||
3.375 times the data size. Default is 142 MiB.
|
||||
|
||||
The following table shows the memory required *per thread* for
|
||||
compression at a given level, using the default data size for each
|
||||
level:
|
||||
The following table shows the memory required *per thread* for compression
|
||||
at a given level, using the default data size for each level:
|
||||
|
||||
Level Memory required
|
||||
-0 4.875 MiB
|
||||
|
@ -572,22 +593,22 @@ File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory
|
|||
7 Minimum file sizes required for full compression speed
|
||||
********************************************************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and
|
||||
compresses as many chunks simultaneously as worker threads are chosen,
|
||||
creating a multimember compressed file.
|
||||
When compressing, plzip divides the input file into chunks and compresses
|
||||
as many chunks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed file.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression
|
||||
speed by the number of available processors), the uncompressed file
|
||||
must be at least as large as the number of worker threads times the
|
||||
chunk size (*note --data-size::). Else some processors will not get any
|
||||
data to compress, and compression will be proportionally slower. The
|
||||
maximum speed increase achievable on a given file is limited by the
|
||||
ratio (file_size / data_size). For example, a tarball the size of gcc or
|
||||
linux will scale up to 8 processors at level -9.
|
||||
For this to work as expected (and roughly multiply the compression speed
|
||||
by the number of available processors), the uncompressed file must be at
|
||||
least as large as the number of worker threads times the chunk size (*note
|
||||
--data-size::). Else some processors will not get any data to compress, and
|
||||
compression will be proportionally slower. The maximum speed increase
|
||||
achievable on a given file is limited by the ratio (file_size / data_size).
|
||||
For example, a tarball the size of gcc or linux will scale up to 10 or 14
|
||||
processors at level -9.
|
||||
|
||||
The following table shows the minimum uncompressed file size needed
|
||||
for full use of N processors at a given compression level, using the
|
||||
default data size for each level:
|
||||
The following table shows the minimum uncompressed file size needed for
|
||||
full use of N processors at a given compression level, using the default
|
||||
data size for each level:
|
||||
|
||||
Processors 2 4 8 16 64 256
|
||||
------------------------------------------------------------------
|
||||
|
@ -612,43 +633,40 @@ File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file siz
|
|||
Sometimes extra data are found appended to a lzip file after the last
|
||||
member. Such trailing data may be:
|
||||
|
||||
* Padding added to make the file size a multiple of some block size,
|
||||
for example when writing to a tape. It is safe to append any
|
||||
amount of padding zero bytes to a lzip file.
|
||||
* Padding added to make the file size a multiple of some block size, for
|
||||
example when writing to a tape. It is safe to append any amount of
|
||||
padding zero bytes to a lzip file.
|
||||
|
||||
* Useful data added by the user; a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount
|
||||
of text to a lzip file as long as none of the first four bytes of
|
||||
the text match the corresponding byte in the string "LZIP", and
|
||||
the text does not contain any zero bytes (null characters).
|
||||
Nonzero bytes and zero bytes can't be safely mixed in trailing
|
||||
data.
|
||||
description of file contents, etc. It is safe to append any amount of
|
||||
text to a lzip file as long as none of the first four bytes of the text
|
||||
match the corresponding byte in the string "LZIP", and the text does
|
||||
not contain any zero bytes (null characters). Nonzero bytes and zero
|
||||
bytes can't be safely mixed in trailing data.
|
||||
|
||||
* Garbage added by some not totally successful copy operation.
|
||||
|
||||
* Malicious data added to the file in order to make its total size
|
||||
and hash value (for a chosen hash) coincide with those of another
|
||||
file.
|
||||
* Malicious data added to the file in order to make its total size and
|
||||
hash value (for a chosen hash) coincide with those of another file.
|
||||
|
||||
* In rare cases, trailing data could be the corrupt header of another
|
||||
member. In multimember or concatenated files the probability of
|
||||
corruption happening in the magic bytes is 5 times smaller than the
|
||||
probability of getting a false positive caused by the corruption
|
||||
of the integrity information itself. Therefore it can be
|
||||
considered to be below the noise level. Additionally, the test
|
||||
used by plzip to discriminate trailing data from a corrupt header
|
||||
has a Hamming distance (HD) of 3, and the 3 bit flips must happen
|
||||
in different magic bytes for the test to fail. In any case, the
|
||||
option '--trailing-error' guarantees that any corrupt header will
|
||||
be detected.
|
||||
probability of getting a false positive caused by the corruption of the
|
||||
integrity information itself. Therefore it can be considered to be
|
||||
below the noise level. Additionally, the test used by plzip to
|
||||
discriminate trailing data from a corrupt header has a Hamming
|
||||
distance (HD) of 3, and the 3 bit flips must happen in different magic
|
||||
bytes for the test to fail. In any case, the option '--trailing-error'
|
||||
guarantees that any corrupt header will be detected.
|
||||
|
||||
Trailing data are in no way part of the lzip file format, but tools
|
||||
reading lzip files are expected to behave as correctly and usefully as
|
||||
possible in the presence of trailing data.
|
||||
|
||||
Trailing data can be safely ignored in most cases. In some cases,
|
||||
like that of user-added data, they are expected to be ignored. In those
|
||||
cases where a file containing trailing data must be rejected, the option
|
||||
Trailing data can be safely ignored in most cases. In some cases, like
|
||||
that of user-added data, they are expected to be ignored. In those cases
|
||||
where a file containing trailing data must be rejected, the option
|
||||
'--trailing-error' can be used. *Note --trailing-error::.
|
||||
|
||||
|
||||
|
@ -660,62 +678,70 @@ File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: T
|
|||
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
|
||||
compressed file (bugs in the system libraries, memory errors, etc).
|
||||
Therefore, if the data you are going to compress are important, give the
|
||||
'--keep' option to plzip and don't remove the original file until you
|
||||
option '--keep' to plzip and don't remove the original file until you
|
||||
verify the compressed file with a command like
|
||||
'plzip -cd file.lz | cmp file -'. Most RAM errors happening during
|
||||
compression can only be detected by comparing the compressed file with
|
||||
the original because the corruption happens before plzip compresses the
|
||||
RAM contents, resulting in a valid compressed file containing wrong
|
||||
data.
|
||||
compression can only be detected by comparing the compressed file with the
|
||||
original because the corruption happens before plzip compresses the RAM
|
||||
contents, resulting in a valid compressed file containing wrong data.
|
||||
|
||||
|
||||
Example 1: Replace a regular file with its compressed version 'file.lz'
|
||||
and show the compression ratio.
|
||||
Example 1: Extract all the files from archive 'foo.tar.lz'.
|
||||
|
||||
tar -xf foo.tar.lz
|
||||
or
|
||||
plzip -cd foo.tar.lz | tar -xf -
|
||||
|
||||
|
||||
Example 2: Replace a regular file with its compressed version 'file.lz' and
|
||||
show the compression ratio.
|
||||
|
||||
plzip -v file
|
||||
|
||||
|
||||
Example 2: Like example 1 but the created 'file.lz' has a block size of
|
||||
Example 3: Like example 1 but the created 'file.lz' has a block size of
|
||||
1 MiB. The compression ratio is not shown.
|
||||
|
||||
plzip -B 1MiB file
|
||||
|
||||
|
||||
Example 3: Restore a regular file from its compressed version
|
||||
'file.lz'. If the operation is successful, 'file.lz' is removed.
|
||||
Example 4: Restore a regular file from its compressed version 'file.lz'. If
|
||||
the operation is successful, 'file.lz' is removed.
|
||||
|
||||
plzip -d file.lz
|
||||
|
||||
|
||||
Example 4: Verify the integrity of the compressed file 'file.lz' and
|
||||
show status.
|
||||
Example 5: Verify the integrity of the compressed file 'file.lz' and show
|
||||
status.
|
||||
|
||||
plzip -tv file.lz
|
||||
|
||||
|
||||
Example 5: Compress a whole device in /dev/sdc and send the output to
|
||||
Example 6: Compress a whole device in /dev/sdc and send the output to
|
||||
'file.lz'.
|
||||
|
||||
plzip -c /dev/sdc > file.lz
|
||||
plzip -c /dev/sdc > file.lz
|
||||
or
|
||||
plzip /dev/sdc -o file.lz
|
||||
|
||||
|
||||
Example 6: The right way of concatenating the decompressed output of two
|
||||
or more compressed files. *Note Trailing data::.
|
||||
Example 7: The right way of concatenating the decompressed output of two or
|
||||
more compressed files. *Note Trailing data::.
|
||||
|
||||
Don't do this
|
||||
cat file1.lz file2.lz file3.lz | plzip -d
|
||||
cat file1.lz file2.lz file3.lz | plzip -d -
|
||||
Do this instead
|
||||
plzip -cd file1.lz file2.lz file3.lz
|
||||
|
||||
|
||||
Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed
|
||||
data are produced.
|
||||
Example 8: Decompress 'file.lz' partially until 10 KiB of decompressed data
|
||||
are produced.
|
||||
|
||||
plzip -cd file.lz | dd bs=1024 count=10
|
||||
|
||||
|
||||
Example 8: Decompress 'file.lz' partially from decompressed byte 10000
|
||||
to decompressed byte 15000 (5000 bytes are produced).
|
||||
Example 9: Decompress 'file.lz' partially from decompressed byte at offset
|
||||
10000 to decompressed byte at offset 14999 (5000 bytes are produced).
|
||||
|
||||
plzip -cd file.lz | dd bs=1000 skip=10 count=5
|
||||
|
||||
|
@ -725,14 +751,14 @@ File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: T
|
|||
10 Reporting bugs
|
||||
*****************
|
||||
|
||||
There are probably bugs in plzip. There are certainly errors and
|
||||
omissions in this manual. If you report them, they will get fixed. If
|
||||
you don't, no one will ever know about them and they will remain unfixed
|
||||
for all eternity, if not longer.
|
||||
There are probably bugs in plzip. There are certainly errors and omissions
|
||||
in this manual. If you report them, they will get fixed. If you don't, no
|
||||
one will ever know about them and they will remain unfixed for all
|
||||
eternity, if not longer.
|
||||
|
||||
If you find a bug in plzip, please send electronic mail to
|
||||
<lzip-bug@nongnu.org>. Include the version number, which you can find
|
||||
by running 'plzip --version'.
|
||||
<lzip-bug@nongnu.org>. Include the version number, which you can find by
|
||||
running 'plzip --version'.
|
||||
|
||||
|
||||
File: plzip.info, Node: Concept index, Prev: Problems, Up: Top
|
||||
|
@ -743,40 +769,40 @@ Concept index
|
|||
|