Merging upstream version 1.12~rc1.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
4ddb634c25
commit
cd6a248630
24 changed files with 874 additions and 719 deletions
3
COPYING
3
COPYING
|
@ -1,8 +1,7 @@
|
|||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
|
||||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc. <http://fsf.org/>
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
|
|
74
ChangeLog
74
ChangeLog
|
@ -1,3 +1,13 @@
|
|||
2024-11-19 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.12-rc1 released.
|
||||
* decompress.cc (decompress), list.cc (list_files):
|
||||
Return 2 if any empty member is found in a multimember file.
|
||||
* dec_stdout.cc, dec_stream.cc:
|
||||
Change 'deliver_packet' to 'deliver_packets'.
|
||||
* plzip.texi: New chapter 'Syntax of command-line arguments'.
|
||||
* check.sh: Use 'cp' instead of 'cat'.
|
||||
|
||||
2024-01-21 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.11 released.
|
||||
|
@ -20,16 +30,17 @@
|
|||
2021-01-03 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.9 released.
|
||||
* New option '--check-lib'.
|
||||
* main.cc (main): Report an error if a file name is empty.
|
||||
(main): Show final diagnostic when testing multiple files.
|
||||
Make '-o' behave like '-c', but writing to file instead of stdout.
|
||||
Make '-c' and '-o' check whether the output is a terminal only once.
|
||||
Do not open output if input is a terminal.
|
||||
* main.cc: New option '--check-lib'.
|
||||
Set a valid invocation_name even if argc == 0.
|
||||
* Replace 'decompressed', 'compressed' with 'out', 'in' in output.
|
||||
* decompress.cc, dec_stream.cc, dec_stdout.cc:
|
||||
* decompress.cc, dec_stdout.cc, dec_stream.cc:
|
||||
Continue testing if any input file fails the test.
|
||||
Show the largest dictionary size in a multimember file.
|
||||
* main.cc: Show final diagnostic when testing multiple files.
|
||||
* decompress.cc, dec_stream.cc [LZ_API_VERSION >= 1012]: Avoid
|
||||
copying decompressed data when testing with lzlib 1.12 or newer.
|
||||
* compress.cc, dec_stream.cc: Start only the worker threads required.
|
||||
|
@ -38,47 +49,46 @@
|
|||
Use plain comparison instead of Boyer-Moore to search for headers.
|
||||
* lzip_index.cc: Improve messages for corruption in last header.
|
||||
* decompress.cc: Shorten messages 'Data error' and 'Unexpected EOF'.
|
||||
* main.cc: Set a valid invocation_name even if argc == 0.
|
||||
* Document extraction from tar.lz in manual, '--help', and man page.
|
||||
* plzip.texi (Introduction): Mention tarlz as an alternative.
|
||||
* plzip.texi: Several fixes and improvements.
|
||||
Several fixes and improvements.
|
||||
* testsuite: Add 8 new test files.
|
||||
|
||||
2019-01-05 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.8 released.
|
||||
* Rename File_* to Lzip_*.
|
||||
* main.cc: New options '--in-slots' and '--out-slots'.
|
||||
* main.cc: Increase default in_slots per worker from 2 to 4.
|
||||
* main.cc: Increase default out_slots per worker from 32 to 64.
|
||||
* New options '--in-slots' and '--out-slots'.
|
||||
* main.cc (main): Increase default in_slots per worker from 2 to 4.
|
||||
(main): Increase default out_slots per worker from 32 to 64.
|
||||
(main): Check return value of close( infd ).
|
||||
* lzip.h (Lzip_trailer): New function 'verify_consistency'.
|
||||
* lzip_index.cc: Detect some kinds of corrupt trailers.
|
||||
* main.cc (main): Check return value of close( infd ).
|
||||
* plzip.texi: Improve description of '-0..-9', '-m', and '-s'.
|
||||
* plzip.texi: Improve descriptions of '-0..-9', '-m', and '-s'.
|
||||
* configure: New option '--with-mingw'.
|
||||
* configure: Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'.
|
||||
Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'.
|
||||
* INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
|
||||
|
||||
2018-02-07 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.7 released.
|
||||
* New option '--loose-trailing'.
|
||||
* compress.cc: Use 'LZ_compress_restart_member' and replace input
|
||||
packet queue by a circular buffer to reduce memory fragmentation.
|
||||
* compress.cc: Return one empty packet at a time to reduce mem use.
|
||||
Return one empty packet at a time to reduce memory use.
|
||||
* main.cc: Reduce threads on 32 bit systems to use under 2.22 GiB.
|
||||
* main.cc: New option '--loose-trailing'.
|
||||
(set_c_outname): Do not add a second '.lz' to the arg of '-o'.
|
||||
(cleanup_and_fail): Suppress messages from other threads.
|
||||
* Improve corrupt header detection to HD = 3 on seekable files.
|
||||
(On all files with lzlib 1.10 or newer).
|
||||
* Replace 'bits/byte' with inverse compression ratio in output.
|
||||
* Show progress of decompression at verbosity level 2 (-vv).
|
||||
* Show progress of (de)compression only if stderr is a terminal.
|
||||
* main.cc: Do not add a second .lz extension to the arg of -o.
|
||||
* Show dictionary size at verbosity level 4 (-vvvv).
|
||||
* main.cc (cleanup_and_fail): Suppress messages from other threads.
|
||||
* list.cc: Add missing '#include <pthread.h>'.
|
||||
* plzip.texi: New chapter 'Output'.
|
||||
* plzip.texi (Memory requirements): Add table.
|
||||
* plzip.texi (Program design): Add a block diagram.
|
||||
* plzip.texi: New chapter 'Meaning of plzip's output'.
|
||||
(Memory requirements): Add table.
|
||||
(Program design): Add a block diagram.
|
||||
|
||||
2017-04-12 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
|
@ -92,14 +102,13 @@
|
|||
2016-05-14 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.5 released.
|
||||
* main.cc: New option '-a, --trailing-error'.
|
||||
* New option '-a, --trailing-error'.
|
||||
* main.cc (main): Delete '--output' file if infd is a terminal.
|
||||
* main.cc (main): Don't use stdin more than once.
|
||||
(main): Don't use stdin more than once.
|
||||
* plzip.texi: New chapters 'Trailing data' and 'Examples'.
|
||||
* configure: Avoid warning on some shells when testing for g++.
|
||||
* Makefile.in: Detect the existence of install-info.
|
||||
* check.sh: A POSIX shell is required to run the tests.
|
||||
* check.sh: Don't check error messages.
|
||||
* check.sh: Require a POSIX shell. Don't check error messages.
|
||||
|
||||
2015-07-09 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
|
@ -136,14 +145,14 @@
|
|||
|
||||
* Version 1.0 released.
|
||||
* compress.cc: Change 'deliver_packet' to 'deliver_packets'.
|
||||
* Scalability of decompression from/to regular files has been
|
||||
increased by removing splitter and muxer when not needed.
|
||||
* The number of worker threads is now limited to the number of
|
||||
members when decompressing from a regular file.
|
||||
* Increase scalability of decompression from/to regular files by
|
||||
removing splitter and muxer when not needed.
|
||||
* Limit the number of worker threads to the number of members when
|
||||
decompressing from a regular file.
|
||||
* configure: Options now accept a separate argument.
|
||||
* Makefile.in: New targets 'install-as-lzip' and 'install-bin'.
|
||||
* main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2.
|
||||
* main.cc: Define 'strtoull' to 'std::strtoul' on Windows.
|
||||
(main): Use 'setmode' instead of '_setmode' on Windows and OS/2.
|
||||
|
||||
2012-03-01 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
|
@ -154,13 +163,13 @@
|
|||
2012-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 0.8 released.
|
||||
* main.cc: New option '-F, --recompress'.
|
||||
* New option '-F, --recompress'.
|
||||
* decompress.cc (decompress): Show compression ratio.
|
||||
* main.cc (close_and_set_permissions): Inability to change output
|
||||
file attributes has been downgraded from error to warning.
|
||||
(main): Set stdin/stdout in binary mode on OS2.
|
||||
* Small change in '--help' output and man page.
|
||||
* Change quote characters in messages as advised by GNU Standards.
|
||||
* main.cc: Set stdin/stdout in binary mode on OS2.
|
||||
* compress.cc: Reduce memory use of compressed packets.
|
||||
* decompress.cc: Use Boyer-Moore algorithm to search for headers.
|
||||
|
||||
|
@ -174,7 +183,7 @@
|
|||
* main.cc (open_instream): Don't show the message
|
||||
" and '--stdout' was not specified" for directories, etc.
|
||||
Exit with status 1 if any output file exists and is skipped.
|
||||
* main.cc: Fix warning about fchown return value being ignored.
|
||||
Fix warning about fchown's return value being ignored.
|
||||
* testsuite: Rename 'test1' to 'test.txt'. New tests.
|
||||
|
||||
2010-03-20 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
@ -202,9 +211,8 @@
|
|||
* Version 0.3 released.
|
||||
* New option '-B, --data-size'.
|
||||
* Output file is now removed if plzip is interrupted.
|
||||
* This version automatically chooses the smallest possible
|
||||
dictionary size for each member during compression, saving
|
||||
memory during decompression.
|
||||
* Choose automatically the smallest possible dictionary size for
|
||||
each member during compression, saving memory during decompression.
|
||||
* main.cc: New constant 'o_binary'.
|
||||
|
||||
2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
DISTNAME = $(pkgname)-$(pkgversion)
|
||||
INSTALL = install
|
||||
INSTALL_PROGRAM = $(INSTALL) -m 755
|
||||
INSTALL_DATA = $(INSTALL) -m 644
|
||||
INSTALL_DIR = $(INSTALL) -d -m 755
|
||||
INSTALL_DATA = $(INSTALL) -m 644
|
||||
SHELL = /bin/sh
|
||||
CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1
|
||||
|
||||
|
@ -34,7 +34,8 @@ main.o : main.cc
|
|||
|
||||
# prevent 'make' from trying to remake source files
|
||||
$(VPATH)/configure $(VPATH)/Makefile.in $(VPATH)/doc/$(pkgname).texi : ;
|
||||
%.h %.cc : ;
|
||||
MAKEFLAGS += -r
|
||||
.SUFFIXES :
|
||||
|
||||
$(objs) : Makefile
|
||||
arg_parser.o : arg_parser.h
|
||||
|
@ -133,8 +134,7 @@ dist : doc
|
|||
$(DISTNAME)/testsuite/test.txt \
|
||||
$(DISTNAME)/testsuite/fox.lz \
|
||||
$(DISTNAME)/testsuite/fox_*.lz \
|
||||
$(DISTNAME)/testsuite/test.txt.lz \
|
||||
$(DISTNAME)/testsuite/test_em.txt.lz
|
||||
$(DISTNAME)/testsuite/test.txt.lz
|
||||
rm -f $(DISTNAME)
|
||||
lzip -v -9 $(DISTNAME).tar
|
||||
|
||||
|
|
16
NEWS
16
NEWS
|
@ -1,14 +1,8 @@
|
|||
Changes in version 1.11:
|
||||
Changes in version 1.12:
|
||||
|
||||
File diagnostics have been reformatted as 'PROGRAM: FILE: MESSAGE'.
|
||||
plzip now exits with error status 2 if any empty member is found in a
|
||||
multimember file.
|
||||
|
||||
Diagnostics caused by invalid arguments to command-line options now show the
|
||||
argument and the name of the option.
|
||||
Scalability when decompressing to standard output has been increased.
|
||||
|
||||
The option '-o, --output' now preserves dates, permissions, and ownership of
|
||||
the file when (de)compressing exactly one file.
|
||||
|
||||
The option '-o, --output' now creates missing intermediate directories when
|
||||
writing to a file.
|
||||
|
||||
The variable MAKEINFO has been added to configure and Makefile.in.
|
||||
The chapter 'Syntax of command-line arguments' has been added to the manual.
|
||||
|
|
28
README
28
README
|
@ -1,26 +1,26 @@
|
|||
Description
|
||||
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip,
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip. Plzip
|
||||
uses the compression library lzlib.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format to maximize interoperability. The
|
||||
maximum dictionary size is 512 MiB so that any lzip file can be decompressed
|
||||
on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
|
||||
checking. Lzip can compress about as fast as gzip (lzip -0) or compress most
|
||||
files more than bzip2 (lzip -9). Decompression speed is intermediate between
|
||||
gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
|
||||
perspective. Lzip has been designed, written, and tested with great care to
|
||||
replace gzip and bzip2 as the standard general-purpose compressed format for
|
||||
Unix-like systems.
|
||||
of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel-Ziv-Markov
|
||||
chain-Algorithm) designed to achieve complete interoperability between
|
||||
implementations. The maximum dictionary size is 512 MiB so that any lzip
|
||||
file can be decompressed on 32-bit machines. Lzip provides accurate and
|
||||
robust 3-factor integrity checking. 'lzip -0' compresses about as fast as
|
||||
gzip, while 'lzip -9' compresses most files more than bzip2. Decompression
|
||||
speed is intermediate between gzip and bzip2. Lzip provides better data
|
||||
recovery capabilities than gzip and bzip2. Lzip has been designed, written,
|
||||
and tested with great care to replace gzip and bzip2 as general-purpose
|
||||
compressed format for Unix-like systems.
|
||||
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
to 2 percent larger compressed files). Note that the number of usable
|
||||
threads is limited by file size; on files larger than a few GB plzip can use
|
||||
hundreds of processors, but on files of only a few MB plzip is no faster
|
||||
than lzip.
|
||||
hundreds of processors, but on files smaller than 1 MiB plzip is no faster
|
||||
than lzip (even at compression level -0).
|
||||
|
||||
For creation and manipulation of compressed tar archives tarlz can be more
|
||||
efficient than using tar and plzip because tarlz is able to keep the
|
||||
|
|
|
@ -75,19 +75,19 @@ bool Arg_parser::parse_long_option( const char * const opt, const char * const a
|
|||
error_ += "' requires an argument";
|
||||
return false;
|
||||
}
|
||||
data.back().argument = &opt[len+3];
|
||||
data.back().argument = &opt[len+3]; // argument may be empty
|
||||
return true;
|
||||
}
|
||||
|
||||
if( options[index].has_arg == yes )
|
||||
if( options[index].has_arg == yes || options[index].has_arg == yme )
|
||||
{
|
||||
if( !arg || !arg[0] )
|
||||
if( !arg || ( options[index].has_arg == yes && !arg[0] ) )
|
||||
{
|
||||
error_ = "option '--"; error_ += options[index].long_name;
|
||||
error_ += "' requires an argument";
|
||||
return false;
|
||||
}
|
||||
++argind; data.back().argument = arg;
|
||||
++argind; data.back().argument = arg; // argument may be empty
|
||||
return true;
|
||||
}
|
||||
|
||||
|
@ -123,15 +123,16 @@ bool Arg_parser::parse_short_option( const char * const opt, const char * const
|
|||
{
|
||||
data.back().argument = &opt[cind]; ++argind; cind = 0;
|
||||
}
|
||||
else if( options[index].has_arg == yes )
|
||||
else if( options[index].has_arg == yes || options[index].has_arg == yme )
|
||||
{
|
||||
if( !arg || !arg[0] )
|
||||
if( !arg || ( options[index].has_arg == yes && !arg[0] ) )
|
||||
{
|
||||
error_ = "option requires an argument -- '"; error_ += c;
|
||||
error_ += '\'';
|
||||
return false;
|
||||
}
|
||||
data.back().argument = arg; ++argind; cind = 0;
|
||||
++argind; cind = 0;
|
||||
data.back().argument = arg; // argument may be empty
|
||||
}
|
||||
}
|
||||
return true;
|
||||
|
|
10
arg_parser.h
10
arg_parser.h
|
@ -36,14 +36,18 @@
|
|||
The argument '--' terminates all options; any following arguments are
|
||||
treated as non-option arguments, even if they begin with a hyphen.
|
||||
|
||||
The syntax for optional option arguments is '-<short_option><argument>'
|
||||
(without whitespace), or '--<long_option>=<argument>'.
|
||||
The syntax of options with an optional argument is
|
||||
'-<short_option><argument>' (without whitespace), or
|
||||
'--<long_option>=<argument>'.
|
||||
|
||||
The syntax of options with an empty argument is '-<short_option> ""',
|
||||
'--<long_option> ""', or '--<long_option>=""'.
|
||||
*/
|
||||
|
||||
class Arg_parser
|
||||
{
|
||||
public:
|
||||
enum Has_arg { no, yes, maybe };
|
||||
enum Has_arg { no, yes, maybe, yme }; // yme = yes but maybe empty
|
||||
|
||||
struct Option
|
||||
{
|
||||
|
|
83
compress.cc
83
compress.cc
|
@ -112,7 +112,6 @@ void xlock( pthread_mutex_t * const mutex )
|
|||
{ show_error( "pthread_mutex_lock", errcode ); cleanup_and_fail(); }
|
||||
}
|
||||
|
||||
|
||||
void xunlock( pthread_mutex_t * const mutex )
|
||||
{
|
||||
const int errcode = pthread_mutex_unlock( mutex );
|
||||
|
@ -158,7 +157,7 @@ struct Packet // data block with a serial number
|
|||
int size; // number of bytes in data (if any)
|
||||
unsigned id; // serial number assigned as received
|
||||
Packet() : data( 0 ), size( 0 ), id( 0 ) {}
|
||||
void init( uint8_t * const d, const int s, const unsigned i )
|
||||
void assign( uint8_t * const d, const int s, const unsigned i )
|
||||
{ data = d; size = s; id = i; }
|
||||
};
|
||||
|
||||
|
@ -176,7 +175,7 @@ private:
|
|||
unsigned deliver_id; // id of next packet to be delivered
|
||||
Slot_tally slot_tally; // limits the number of input packets
|
||||
std::vector< Packet > circular_ibuffer;
|
||||
std::vector< const Packet * > circular_obuffer;
|
||||
std::vector< const Packet * > circular_obuffer; // pointers to ibuffer
|
||||
int num_working; // number of workers still running
|
||||
const int num_slots; // max packets in circulation
|
||||
pthread_mutex_t imutex;
|
||||
|
@ -212,7 +211,7 @@ public:
|
|||
{
|
||||
slot_tally.get_slot(); // wait for a free slot
|
||||
xlock( &imutex );
|
||||
circular_ibuffer[receive_id % num_slots].init( data, size, receive_id );
|
||||
circular_ibuffer[receive_id % num_slots].assign( data, size, receive_id );
|
||||
++receive_id;
|
||||
xsignal( &iav_or_eof );
|
||||
xunlock( &imutex );
|
||||
|
@ -221,7 +220,6 @@ public:
|
|||
// distribute a packet to a worker
|
||||
Packet * distribute_packet()
|
||||
{
|
||||
Packet * ipacket = 0;
|
||||
xlock( &imutex );
|
||||
++icheck_counter;
|
||||
while( receive_id == distrib_id && !eof ) // no packets to distribute
|
||||
|
@ -230,15 +228,13 @@ public:
|
|||
xwait( &iav_or_eof, &imutex );
|
||||
}
|
||||
if( receive_id != distrib_id )
|
||||
{ ipacket = &circular_ibuffer[distrib_id % num_slots]; ++distrib_id; }
|
||||
{ Packet * ipacket = &circular_ibuffer[distrib_id % num_slots];
|
||||
++distrib_id; xunlock( &imutex ); return ipacket; }
|
||||
xunlock( &imutex );
|
||||
if( !ipacket ) // EOF
|
||||
{
|
||||
xlock( &omutex ); // notify muxer when last worker exits
|
||||
if( --num_working == 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
}
|
||||
return ipacket;
|
||||
xlock( &omutex ); // notify muxer when last worker exits
|
||||
if( --num_working == 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
return 0; // EOF
|
||||
}
|
||||
|
||||
// collect a packet from a worker
|
||||
|
@ -307,30 +303,38 @@ public:
|
|||
|
||||
struct Worker_arg
|
||||
{
|
||||
Packet_courier * courier;
|
||||
const Pretty_print * pp;
|
||||
int dictionary_size;
|
||||
int match_len_limit;
|
||||
int offset;
|
||||
Packet_courier & courier;
|
||||
const Pretty_print & pp;
|
||||
const int dictionary_size;
|
||||
const int match_len_limit;
|
||||
const int offset;
|
||||
Worker_arg( Packet_courier & co, const Pretty_print & pp_, const int dis,
|
||||
const int mll, const int off )
|
||||
: courier( co ), pp( pp_ ), dictionary_size( dis ),
|
||||
match_len_limit( mll ), offset( off ) {}
|
||||
};
|
||||
|
||||
struct Splitter_arg
|
||||
{
|
||||
struct Worker_arg worker_arg;
|
||||
pthread_t * worker_threads;
|
||||
int infd;
|
||||
int data_size;
|
||||
Worker_arg worker_arg;
|
||||
pthread_t * const worker_threads;
|
||||
const int data_size;
|
||||
const int infd;
|
||||
int num_workers; // returned by splitter to main thread
|
||||
Splitter_arg( Packet_courier & co, const Pretty_print & pp_, const int dis,
|
||||
const int mll, const int off, pthread_t * wt, const int das,
|
||||
const int ifd, const int nw )
|
||||
: worker_arg( co, pp_, dis, mll, off ), worker_threads( wt ),
|
||||
data_size( das ), infd( ifd ), num_workers( nw ) {}
|
||||
};
|
||||
|
||||
|
||||
/* Get packets from courier, replace their contents, and return them to
|
||||
courier. */
|
||||
// get packets from courier, replace their contents, and return them to courier
|
||||
extern "C" void * cworker( void * arg )
|
||||
{
|
||||
const Worker_arg & tmp = *(const Worker_arg *)arg;
|
||||
Packet_courier & courier = *tmp.courier;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
Packet_courier & courier = tmp.courier;
|
||||
const Pretty_print & pp = tmp.pp;
|
||||
const int dictionary_size = tmp.dictionary_size;
|
||||
const int match_len_limit = tmp.match_len_limit;
|
||||
const int offset = tmp.offset;
|
||||
|
@ -407,8 +411,8 @@ extern "C" void * cworker( void * arg )
|
|||
extern "C" void * csplitter( void * arg )
|
||||
{
|
||||
Splitter_arg & tmp = *(Splitter_arg *)arg;
|
||||
Packet_courier & courier = *tmp.worker_arg.courier;
|
||||
const Pretty_print & pp = *tmp.worker_arg.pp;
|
||||
Packet_courier & courier = tmp.worker_arg.courier;
|
||||
const Pretty_print & pp = tmp.worker_arg.pp;
|
||||
pthread_t * const worker_threads = tmp.worker_threads;
|
||||
const int offset = tmp.worker_arg.offset;
|
||||
const int infd = tmp.infd;
|
||||
|
@ -436,11 +440,7 @@ extern "C" void * csplitter( void * arg )
|
|||
}
|
||||
if( size < data_size ) break; // EOF
|
||||
}
|
||||
else
|
||||
{
|
||||
delete[] data;
|
||||
break;
|
||||
}
|
||||
else { delete[] data; break; }
|
||||
}
|
||||
courier.finish( tmp.num_workers - i ); // no more packets to send
|
||||
tmp.num_workers = i;
|
||||
|
@ -465,7 +465,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
|||
out_size += opacket->size;
|
||||
|
||||
if( writeblock( outfd, opacket->data, opacket->size ) != opacket->size )
|
||||
{ pp(); show_error( "Write error", errno ); cleanup_and_fail(); }
|
||||
{ pp(); show_error( write_error_msg, errno ); cleanup_and_fail(); }
|
||||
delete[] opacket->data;
|
||||
courier.return_empty_packet();
|
||||
}
|
||||
|
@ -475,8 +475,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
|
|||
} // end namespace
|
||||
|
||||
|
||||
/* Init the courier, then start the splitter and the workers and call the
|
||||
muxer. */
|
||||
// init the courier, then start the splitter and the workers and call the muxer
|
||||
int compress( const unsigned long long cfile_size,
|
||||
const int data_size, const int dictionary_size,
|
||||
const int match_len_limit, const int num_workers,
|
||||
|
@ -496,16 +495,8 @@ int compress( const unsigned long long cfile_size,
|
|||
pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
|
||||
if( !worker_threads ) { pp( mem_msg ); return 1; }
|
||||
|
||||
Splitter_arg splitter_arg;
|
||||
splitter_arg.worker_arg.courier = &courier;
|
||||
splitter_arg.worker_arg.pp = &pp;
|
||||
splitter_arg.worker_arg.dictionary_size = dictionary_size;
|
||||
splitter_arg.worker_arg.match_len_limit = match_len_limit;
|
||||
splitter_arg.worker_arg.offset = offset;
|
||||
splitter_arg.worker_threads = worker_threads;
|
||||
splitter_arg.infd = infd;
|
||||
splitter_arg.data_size = data_size;
|
||||
splitter_arg.num_workers = num_workers;
|
||||
Splitter_arg splitter_arg( courier, pp, dictionary_size, match_len_limit,
|
||||
offset, worker_threads, data_size, infd, num_workers );
|
||||
|
||||
pthread_t splitter_thread;
|
||||
int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg );
|
||||
|
|
4
configure
vendored
4
configure
vendored
|
@ -6,7 +6,7 @@
|
|||
# to copy, distribute, and modify it.
|
||||
|
||||
pkgname=plzip
|
||||
pkgversion=1.11
|
||||
pkgversion=1.12-rc1
|
||||
progname=plzip
|
||||
with_mingw=
|
||||
srctrigger=doc/${pkgname}.texi
|
||||
|
@ -115,7 +115,7 @@ while [ $# != 0 ] ; do
|
|||
exit 1 ;;
|
||||
esac
|
||||
|
||||
# Check if the option took a separate argument
|
||||
# Check whether the option took a separate argument
|
||||
if [ "${arg2}" = yes ] ; then
|
||||
if [ $# != 0 ] ; then args="${args} \"$1\"" ; shift
|
||||
else echo "configure: Missing argument to '${option}'" 1>&2
|
||||
|
|
119
dec_stdout.cc
119
dec_stdout.cc
|
@ -46,10 +46,10 @@ struct Packet // data block
|
|||
uint8_t * data; // data may be null if size == 0
|
||||
int size; // number of bytes in data (if any)
|
||||
bool eom; // end of member
|
||||
Packet() : data( 0 ), size( 0 ), eom( true ) {}
|
||||
Packet() : data( 0 ), size( 0 ), eom( false ) {}
|
||||
Packet( uint8_t * const d, const int s, const bool e )
|
||||
: data( d ), size( s ), eom ( e ) {}
|
||||
~Packet() { if( data ) delete[] data; }
|
||||
void delete_data() { if( data ) { delete[] data; data = 0; } }
|
||||
};
|
||||
|
||||
|
||||
|
@ -59,8 +59,8 @@ public:
|
|||
unsigned ocheck_counter;
|
||||
unsigned owait_counter;
|
||||
private:
|
||||
int deliver_worker_id; // worker queue currently delivering packets
|
||||
std::vector< std::queue< const Packet * > > opacket_queues;
|
||||
int deliver_id; // worker queue currently delivering packets
|
||||
std::vector< std::queue< Packet > > opacket_queues;
|
||||
int num_working; // number of workers still running
|
||||
const int num_workers; // number of workers
|
||||
const unsigned out_slots; // max output packets per queue
|
||||
|
@ -75,10 +75,9 @@ private:
|
|||
public:
|
||||
Packet_courier( const Shared_retval & sh_ret, const int workers,
|
||||
const int slots )
|
||||
: ocheck_counter( 0 ), owait_counter( 0 ), deliver_worker_id( 0 ),
|
||||
opacket_queues( workers ), num_working( workers ),
|
||||
num_workers( workers ), out_slots( slots ), slot_av( workers ),
|
||||
shared_retval( sh_ret )
|
||||
: ocheck_counter( 0 ), owait_counter( 0 ), deliver_id( 0 ),
|
||||
opacket_queues( workers ), num_working( workers ), num_workers( workers ),
|
||||
out_slots( slots ), slot_av( workers ), shared_retval( sh_ret )
|
||||
{
|
||||
xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
|
||||
for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] );
|
||||
|
@ -89,7 +88,7 @@ public:
|
|||
if( shared_retval() ) // cleanup to avoid memory leaks
|
||||
for( int i = 0; i < num_workers; ++i )
|
||||
while( !opacket_queues[i].empty() )
|
||||
{ delete opacket_queues[i].front(); opacket_queues[i].pop(); }
|
||||
{ opacket_queues[i].front().delete_data(); opacket_queues[i].pop(); }
|
||||
for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
|
||||
xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
|
||||
}
|
||||
|
@ -102,49 +101,47 @@ public:
|
|||
xunlock( &omutex );
|
||||
}
|
||||
|
||||
// collect a packet from a worker, discard packet on error
|
||||
void collect_packet( const Packet * const opacket, const int worker_id )
|
||||
// make a packet with data received from a worker, discard data on error
|
||||
void collect_packet( const int worker_id, uint8_t * const data,
|
||||
const int size, const bool eom )
|
||||
{
|
||||
Packet opacket( data, size, eom );
|
||||
xlock( &omutex );
|
||||
if( opacket->data )
|
||||
if( data )
|
||||
while( opacket_queues[worker_id].size() >= out_slots )
|
||||
{
|
||||
if( shared_retval() ) { delete opacket; goto done; }
|
||||
if( shared_retval() ) { delete[] data; goto out; }
|
||||
xwait( &slot_av[worker_id], &omutex );
|
||||
}
|
||||
opacket_queues[worker_id].push( opacket );
|
||||
if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
|
||||
done:
|
||||
xunlock( &omutex );
|
||||
if( worker_id == deliver_id ) xsignal( &oav_or_exit );
|
||||
out: xunlock( &omutex );
|
||||
}
|
||||
|
||||
/* deliver a packet to muxer
|
||||
if packet->eom, move to next queue
|
||||
if packet data == 0, wait again */
|
||||
const Packet * deliver_packet()
|
||||
/* deliver packets to muxer
|
||||
if opacket.eom, move to next queue
|
||||
if opacket.data == 0, skip opacket */
|
||||
void deliver_packets( std::vector< Packet > & packet_vector )
|
||||
{
|
||||
const Packet * opacket = 0;
|
||||
packet_vector.clear();
|
||||
xlock( &omutex );
|
||||
++ocheck_counter;
|
||||
while( true )
|
||||
{
|
||||
while( opacket_queues[deliver_worker_id].empty() && num_working > 0 )
|
||||
do {
|
||||
while( opacket_queues[deliver_id].empty() && num_working > 0 )
|
||||
{ ++owait_counter; xwait( &oav_or_exit, &omutex ); }
|
||||
while( true )
|
||||
{
|
||||
++owait_counter;
|
||||
xwait( &oav_or_exit, &omutex );
|
||||
if( opacket_queues[deliver_id].empty() ) break;
|
||||
Packet opacket = opacket_queues[deliver_id].front();
|
||||
opacket_queues[deliver_id].pop();
|
||||
if( opacket_queues[deliver_id].size() + 1 == out_slots )
|
||||
xsignal( &slot_av[deliver_id] );
|
||||
if( opacket.eom && ++deliver_id >= num_workers ) deliver_id = 0;
|
||||
if( opacket.data ) packet_vector.push_back( opacket );
|
||||
}
|
||||
if( opacket_queues[deliver_worker_id].empty() ) break;
|
||||
opacket = opacket_queues[deliver_worker_id].front();
|
||||
opacket_queues[deliver_worker_id].pop();
|
||||
if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
|
||||
xsignal( &slot_av[deliver_worker_id] );
|
||||
if( opacket->eom && ++deliver_worker_id >= num_workers )
|
||||
deliver_worker_id = 0;
|
||||
if( opacket->data ) break;
|
||||
delete opacket; opacket = 0;
|
||||
}
|
||||
while( packet_vector.empty() && num_working > 0 );
|
||||
xunlock( &omutex );
|
||||
return opacket;
|
||||
}
|
||||
|
||||
bool finished() // all packets delivered to muxer
|
||||
|
@ -163,9 +160,14 @@ struct Worker_arg
|
|||
Packet_courier * courier;
|
||||
const Pretty_print * pp;
|
||||
Shared_retval * shared_retval;
|
||||
int worker_id;
|
||||
int num_workers;
|
||||
int infd;
|
||||
int num_workers;
|
||||
int worker_id;
|
||||
void assign( const Lzip_index & li, Packet_courier & co,
|
||||
const Pretty_print & pp_, Shared_retval & sr,
|
||||
const int ifd, const int nw, const int wi )
|
||||
{ lzip_index = &li; courier = &co; pp = &pp_; shared_retval = &sr;
|
||||
infd = ifd; num_workers = nw; worker_id = wi; }
|
||||
};
|
||||
|
||||
|
||||
|
@ -179,9 +181,9 @@ extern "C" void * dworker_o( void * arg )
|
|||
Packet_courier & courier = *tmp.courier;
|
||||
const Pretty_print & pp = *tmp.pp;
|
||||
Shared_retval & shared_retval = *tmp.shared_retval;
|
||||
const int worker_id = tmp.worker_id;
|
||||
const int num_workers = tmp.num_workers;
|
||||
const int infd = tmp.infd;
|
||||
const int num_workers = tmp.num_workers;
|
||||
const int worker_id = tmp.worker_id;
|
||||
const int buffer_size = 65536;
|
||||
|
||||
int new_pos = 0;
|
||||
|
@ -231,12 +233,11 @@ extern "C" void * dworker_o( void * arg )
|
|||
const bool eom = LZ_decompress_finished( decoder ) == 1;
|
||||
if( new_pos == max_packet_size || eom ) // make data packet
|
||||
{
|
||||
const Packet * const opacket =
|
||||
new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
|
||||
courier.collect_packet( opacket, worker_id );
|
||||
courier.collect_packet( worker_id, ( new_pos > 0 ) ? new_data : 0,
|
||||
new_pos, eom );
|
||||
if( new_pos > 0 ) { new_pos = 0; new_data = 0; }
|
||||
if( eom )
|
||||
{ LZ_decompress_reset( decoder ); // prepare for new member
|
||||
{ LZ_decompress_reset( decoder ); // prepare for next member
|
||||
break; }
|
||||
}
|
||||
if( rd == 0 ) break;
|
||||
|
@ -262,23 +263,28 @@ done:
|
|||
void muxer( Packet_courier & courier, const Pretty_print & pp,
|
||||
Shared_retval & shared_retval, const int outfd )
|
||||
{
|
||||
std::vector< Packet > packet_vector;
|
||||
while( true )
|
||||
{
|
||||
const Packet * const opacket = courier.deliver_packet();
|
||||
if( !opacket ) break; // queue is empty. all workers exited
|
||||
courier.deliver_packets( packet_vector );
|
||||
if( packet_vector.empty() ) break; // queue is empty. all workers exited
|
||||
|
||||
if( shared_retval() == 0 &&
|
||||
writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Write error", errno ); }
|
||||
delete opacket;
|
||||
for( unsigned i = 0; i < packet_vector.size(); ++i )
|
||||
{
|
||||
Packet & opacket = packet_vector[i];
|
||||
if( shared_retval() == 0 &&
|
||||
writeblock( outfd, opacket.data, opacket.size ) != opacket.size &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( write_error_msg, errno ); }
|
||||
opacket.delete_data();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} // end namespace
|
||||
|
||||
|
||||
// init the courier, then start the workers and call the muxer.
|
||||
// init the courier, then start the workers and call the muxer
|
||||
int dec_stdout( const int num_workers, const int infd, const int outfd,
|
||||
const Pretty_print & pp, const int debug_level,
|
||||
const int out_slots, const Lzip_index & lzip_index )
|
||||
|
@ -294,13 +300,8 @@ int dec_stdout( const int num_workers, const int infd, const int outfd,
|
|||
int i = 0; // number of workers started
|
||||
for( ; i < num_workers; ++i )
|
||||
{
|
||||
worker_args[i].lzip_index = &lzip_index;
|
||||
worker_args[i].courier = &courier;
|
||||
worker_args[i].pp = &pp;
|
||||
worker_args[i].shared_retval = &shared_retval;
|
||||
worker_args[i].worker_id = i;
|
||||
worker_args[i].num_workers = num_workers;
|
||||
worker_args[i].infd = infd;
|
||||
worker_args[i].assign( lzip_index, courier, pp, shared_retval, infd,
|
||||
num_workers, i );
|
||||
const int errcode =
|
||||
pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] );
|
||||
if( errcode )
|
||||
|
|
191
dec_stream.cc
191
dec_stream.cc
|
@ -54,10 +54,10 @@ struct Packet // data block
|
|||
uint8_t * data; // data may be null if size == 0
|
||||
int size; // number of bytes in data (if any)
|
||||
bool eom; // end of member
|
||||
Packet() : data( 0 ), size( 0 ), eom( true ) {}
|
||||
Packet() : data( 0 ), size( 0 ), eom( false ) {}
|
||||
Packet( uint8_t * const d, const int s, const bool e )
|
||||
: data( d ), size( s ), eom ( e ) {}
|
||||
~Packet() { if( data ) delete[] data; }
|
||||
void delete_data() { if( data ) { delete[] data; data = 0; } }
|
||||
};
|
||||
|
||||
|
||||
|
@ -69,11 +69,11 @@ public:
|
|||
unsigned ocheck_counter;
|
||||
unsigned owait_counter;
|
||||
private:
|
||||
int receive_worker_id; // worker queue currently receiving packets
|
||||
int deliver_worker_id; // worker queue currently delivering packets
|
||||
int receive_id; // worker queue currently receiving packets
|
||||
int deliver_id; // worker queue currently delivering packets
|
||||
Slot_tally slot_tally; // limits the number of input packets
|
||||
std::vector< std::queue< const Packet * > > ipacket_queues;
|
||||
std::vector< std::queue< const Packet * > > opacket_queues;
|
||||
std::vector< std::queue< Packet > > ipacket_queues;
|
||||
std::vector< std::queue< Packet > > opacket_queues;
|
||||
int num_working; // number of workers still running
|
||||
const int num_workers; // number of workers
|
||||
const unsigned out_slots; // max output packets per queue
|
||||
|
@ -94,11 +94,11 @@ public:
|
|||
const int in_slots, const int oslots )
|
||||
: icheck_counter( 0 ), iwait_counter( 0 ),
|
||||
ocheck_counter( 0 ), owait_counter( 0 ),
|
||||
receive_worker_id( 0 ), deliver_worker_id( 0 ),
|
||||
slot_tally( in_slots ), ipacket_queues( workers ),
|
||||
opacket_queues( workers ), num_working( workers ),
|
||||
num_workers( workers ), out_slots( oslots ), slot_av( workers ),
|
||||
shared_retval( sh_ret ), eof( false ), trailing_data_found_( false )
|
||||
receive_id( 0 ), deliver_id( 0 ), slot_tally( in_slots ),
|
||||
ipacket_queues( workers ), opacket_queues( workers ),
|
||||
num_working( workers ), num_workers( workers ),
|
||||
out_slots( oslots ), slot_av( workers ), shared_retval( sh_ret ),
|
||||
eof( false ), trailing_data_found_( false )
|
||||
{
|
||||
xinit_mutex( &imutex ); xinit_cond( &iav_or_eof );
|
||||
xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
|
||||
|
@ -111,9 +111,9 @@ public:
|
|||
for( int i = 0; i < num_workers; ++i )
|
||||
{
|
||||
while( !ipacket_queues[i].empty() )
|
||||
{ delete ipacket_queues[i].front(); ipacket_queues[i].pop(); }
|
||||
{ ipacket_queues[i].front().delete_data(); ipacket_queues[i].pop(); }
|
||||
while( !opacket_queues[i].empty() )
|
||||
{ delete opacket_queues[i].front(); opacket_queues[i].pop(); }
|
||||
{ opacket_queues[i].front().delete_data(); opacket_queues[i].pop(); }
|
||||
}
|
||||
for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
|
||||
xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
|
||||
|
@ -125,19 +125,18 @@ public:
|
|||
void receive_packet( uint8_t * const data, const int size, const bool eom )
|
||||
{
|
||||
if( shared_retval() ) { delete[] data; return; } // discard packet on error
|
||||
const Packet * const ipacket = new Packet( data, size, eom );
|
||||
const Packet ipacket( data, size, eom );
|
||||
slot_tally.get_slot(); // wait for a free slot
|
||||
xlock( &imutex );
|
||||
ipacket_queues[receive_worker_id].push( ipacket );
|
||||
ipacket_queues[receive_id].push( ipacket );
|
||||
xbroadcast( &iav_or_eof );
|
||||
xunlock( &imutex );
|
||||
if( eom && ++receive_worker_id >= num_workers ) receive_worker_id = 0;
|
||||
if( eom && ++receive_id >= num_workers ) receive_id = 0;
|
||||
}
|
||||
|
||||
// distribute a packet to a worker
|
||||
const Packet * distribute_packet( const int worker_id )
|
||||
Packet distribute_packet( const int worker_id )
|
||||
{
|
||||
const Packet * ipacket = 0;
|
||||
xlock( &imutex );
|
||||
++icheck_counter;
|
||||
while( ipacket_queues[worker_id].empty() && !eof )
|
||||
|
@ -147,63 +146,58 @@ public:
|
|||
}
|
||||
if( !ipacket_queues[worker_id].empty() )
|
||||
{
|
||||
ipacket = ipacket_queues[worker_id].front();
|
||||
const Packet ipacket = ipacket_queues[worker_id].front();
|
||||
ipacket_queues[worker_id].pop();
|
||||
xunlock( &imutex ); slot_tally.leave_slot(); return ipacket;
|
||||
}
|
||||
xunlock( &imutex );
|
||||
if( ipacket ) slot_tally.leave_slot();
|
||||
else // no more packets
|
||||
{
|
||||
xlock( &omutex ); // notify muxer when last worker exits
|
||||
if( --num_working == 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
}
|
||||
return ipacket;
|
||||
xunlock( &imutex ); // no more packets
|
||||
xlock( &omutex ); // notify muxer when last worker exits
|
||||
if( --num_working == 0 ) xsignal( &oav_or_exit );
|
||||
xunlock( &omutex );
|
||||
return Packet();
|
||||
}
|
||||
|
||||
// collect a packet from a worker, discard packet on error
|
||||
void collect_packet( const Packet * const opacket, const int worker_id )
|
||||
// make a packet with data received from a worker, discard data on error
|
||||
void collect_packet( const int worker_id, uint8_t * const data,
|
||||
const int size, const bool eom )
|
||||
{
|
||||
Packet opacket( data, size, eom );
|
||||
xlock( &omutex );
|
||||
if( opacket->data )
|
||||
if( data )
|
||||
while( opacket_queues[worker_id].size() >= out_slots )
|
||||
{
|
||||
if( shared_retval() ) { delete opacket; goto done; }
|
||||
if( shared_retval() ) { delete[] data; goto out; }
|
||||
xwait( &slot_av[worker_id], &omutex );
|
||||
}
|
||||
opacket_queues[worker_id].push( opacket );
|
||||
if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
|
||||
done:
|
||||
xunlock( &omutex );
|
||||
if( worker_id == deliver_id ) xsignal( &oav_or_exit );
|
||||
out: xunlock( &omutex );
|
||||
}
|
||||
|
||||
/* deliver a packet to muxer
|
||||
if packet->eom, move to next queue
|
||||
if packet data == 0, wait again */
|
||||
const Packet * deliver_packet()
|
||||
/* deliver packets to muxer
|
||||
if opacket.eom, move to next queue
|
||||
if opacket.data == 0, skip opacket */
|
||||
void deliver_packets( std::vector< Packet > & packet_vector )
|
||||
{
|
||||
const Packet * opacket = 0;
|
||||
packet_vector.clear();
|
||||
xlock( &omutex );
|
||||
++ocheck_counter;
|
||||
while( true )
|
||||
{
|
||||
while( opacket_queues[deliver_worker_id].empty() && num_working > 0 )
|
||||
do {
|
||||
while( opacket_queues[deliver_id].empty() && num_working > 0 )
|
||||
{ ++owait_counter; xwait( &oav_or_exit, &omutex ); }
|
||||
while( true )
|
||||
{
|
||||
++owait_counter;
|
||||
xwait( &oav_or_exit, &omutex );
|
||||
if( opacket_queues[deliver_id].empty() ) break;
|
||||
Packet opacket = opacket_queues[deliver_id].front();
|
||||
opacket_queues[deliver_id].pop();
|
||||
if( opacket_queues[deliver_id].size() + 1 == out_slots )
|
||||
xsignal( &slot_av[deliver_id] );
|
||||
if( opacket.eom && ++deliver_id >= num_workers ) deliver_id = 0;
|
||||
if( opacket.data ) packet_vector.push_back( opacket );
|
||||
}
|
||||
if( opacket_queues[deliver_worker_id].empty() ) break;
|
||||
opacket = opacket_queues[deliver_worker_id].front();
|
||||
opacket_queues[deliver_worker_id].pop();
|
||||
if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
|
||||
xsignal( &slot_av[deliver_worker_id] );
|
||||
if( opacket->eom && ++deliver_worker_id >= num_workers )
|
||||
deliver_worker_id = 0;
|
||||
if( opacket->data ) break;
|
||||
delete opacket; opacket = 0;
|
||||
}
|
||||
while( packet_vector.empty() && num_working > 0 );
|
||||
xunlock( &omutex );
|
||||
return opacket;
|
||||
}
|
||||
|
||||
void add_sizes( const unsigned long long partial_in_size,
|
||||
|
@ -252,17 +246,29 @@ struct Worker_arg
|
|||
bool loose_trailing;
|
||||
bool testing;
|
||||
bool nocopy; // avoid copying decompressed data when testing
|
||||
void assign( Packet_courier & co, const Pretty_print & pp_,
|
||||
Shared_retval & sr, const bool it, const bool lt,
|
||||
const bool t, const bool nc )
|
||||
{ courier = &co; pp = &pp_; shared_retval = &sr; worker_id = 0;
|
||||
ignore_trailing = it; loose_trailing = lt; testing = t; nocopy = nc; }
|
||||
};
|
||||
|
||||
struct Splitter_arg
|
||||
{
|
||||
struct Worker_arg worker_arg;
|
||||
Worker_arg * worker_args;
|
||||
pthread_t * worker_threads;
|
||||
unsigned long long cfile_size;
|
||||
int infd;
|
||||
Worker_arg worker_arg;
|
||||
Worker_arg * const worker_args;
|
||||
pthread_t * const worker_threads;
|
||||
const unsigned long long cfile_size;
|
||||
const int infd;
|
||||
unsigned dictionary_size; // returned by splitter to main thread
|
||||
int num_workers; // returned by splitter to main thread
|
||||
Splitter_arg( Packet_courier & co, const Pretty_print & pp_,
|
||||
Shared_retval & sr, const bool it, const bool lt,
|
||||
const bool t, const bool nc, Worker_arg * wa, pthread_t * wt,
|
||||
const unsigned long long cfs, const int ifd, const int nw )
|
||||
: worker_args( wa ), worker_threads( wt ), cfile_size( cfs ),
|
||||
infd( ifd ), dictionary_size( 0 ), num_workers( nw )
|
||||
{ worker_arg.assign( co, pp_, sr, it, lt, t, nc ); }
|
||||
};
|
||||
|
||||
|
||||
|
@ -291,22 +297,22 @@ extern "C" void * dworker_s( void * arg )
|
|||
|
||||
while( true )
|
||||
{
|
||||
const Packet * const ipacket = courier.distribute_packet( worker_id );
|
||||
if( !ipacket ) break; // no more packets to process
|
||||
Packet ipacket = courier.distribute_packet( worker_id );
|
||||
if( !ipacket.data ) break; // no more packets to process
|
||||
|
||||
int written = 0;
|
||||
while( !draining ) // else discard trailing data or drain queue
|
||||
{
|
||||
if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket->size )
|
||||
if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket.size )
|
||||
{
|
||||
const int wr = LZ_decompress_write( decoder, ipacket->data + written,
|
||||
ipacket->size - written );
|
||||
const int wr = LZ_decompress_write( decoder, ipacket.data + written,
|
||||
ipacket.size - written );
|
||||
if( wr < 0 ) internal_error( "library error (LZ_decompress_write)." );
|
||||
written += wr;
|
||||
if( written > ipacket->size )
|
||||
if( written > ipacket.size )
|
||||
internal_error( "ipacket size exceeded in worker." );
|
||||
}
|
||||
if( ipacket->eom && written == ipacket->size )
|
||||
if( ipacket.eom && written == ipacket.size )
|
||||
LZ_decompress_finish( decoder );
|
||||
unsigned long long total_in = 0; // detect empty member + corrupt header
|
||||
while( !draining ) // read and pack decompressed data
|
||||
|
@ -353,14 +359,13 @@ extern "C" void * dworker_s( void * arg )
|
|||
{
|
||||
if( !testing ) // make data packet
|
||||
{
|
||||
const Packet * const opacket =
|
||||
new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
|
||||
courier.collect_packet( opacket, worker_id );
|
||||
courier.collect_packet( worker_id, ( new_pos > 0 ) ? new_data : 0,
|
||||
new_pos, eom );
|
||||
if( new_pos > 0 ) new_data = 0;
|
||||
}
|
||||
new_pos = 0;
|
||||
if( eom )
|
||||
{ LZ_decompress_reset( decoder ); // prepare for new member
|
||||
{ LZ_decompress_reset( decoder ); // prepare for next member
|
||||
break; }
|
||||
}
|
||||
if( rd == 0 )
|
||||
|
@ -369,9 +374,9 @@ extern "C" void * dworker_s( void * arg )
|
|||
if( total_in == size ) break; else total_in = size;
|
||||
}
|
||||
}
|
||||
if( !ipacket->data || written == ipacket->size ) break;
|
||||
if( !ipacket.data || written == ipacket.size ) break;
|
||||
}
|
||||
delete ipacket;
|
||||
ipacket.delete_data();
|
||||
}
|
||||
|
||||
if( new_data ) delete[] new_data;
|
||||
|
@ -404,7 +409,7 @@ bool start_worker( const Worker_arg & worker_arg,
|
|||
packaging and distribution to workers.
|
||||
Start a worker per member up to a maximum of num_workers.
|
||||
*/
|
||||
extern "C" void * dsplitter_s( void * arg )
|
||||
extern "C" void * dsplitter( void * arg )
|
||||
{
|
||||
Splitter_arg & tmp = *(Splitter_arg *)arg;
|
||||
const Worker_arg & worker_arg = tmp.worker_arg;
|
||||
|
@ -546,16 +551,21 @@ fail:
|
|||
void muxer( Packet_courier & courier, const Pretty_print & pp,
|
||||
Shared_retval & shared_retval, const int outfd )
|
||||
{
|
||||
std::vector< Packet > packet_vector;
|
||||
while( true )
|
||||
{
|
||||
const Packet * const opacket = courier.deliver_packet();
|
||||
if( !opacket ) break; // queue is empty. all workers exited
|
||||
courier.deliver_packets( packet_vector );
|
||||
if( packet_vector.empty() ) break; // queue is empty. all workers exited
|
||||
|
||||
if( shared_retval() == 0 &&
|
||||
writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( "Write error", errno ); }
|
||||
delete opacket;
|
||||
for( unsigned i = 0; i < packet_vector.size(); ++i )
|
||||
{
|
||||
Packet & opacket = packet_vector[i];
|
||||
if( shared_retval() == 0 &&
|
||||
writeblock( outfd, opacket.data, opacket.size ) != opacket.size &&
|
||||
shared_retval.set_value( 1 ) )
|
||||
{ pp(); show_error( write_error_msg, errno ); }
|
||||
opacket.delete_data();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -590,23 +600,12 @@ int dec_stream( const unsigned long long cfile_size, const int num_workers,
|
|||
const bool nocopy = false;
|
||||
#endif
|
||||
|
||||
Splitter_arg splitter_arg;
|
||||
splitter_arg.worker_arg.courier = &courier;
|
||||
splitter_arg.worker_arg.pp = &pp;
|
||||
splitter_arg.worker_arg.shared_retval = &shared_retval;
|
||||
splitter_arg.worker_arg.worker_id = 0;
|
||||
splitter_arg.worker_arg.ignore_trailing = cl_opts.ignore_trailing;
|
||||
splitter_arg.worker_arg.loose_trailing = cl_opts.loose_trailing;
|
||||
splitter_arg.worker_arg.testing = ( outfd < 0 );
|
||||
splitter_arg.worker_arg.nocopy = nocopy;
|
||||
splitter_arg.worker_args = worker_args;
|
||||
splitter_arg.worker_threads = worker_threads;
|
||||
splitter_arg.cfile_size = cfile_size;
|
||||
splitter_arg.infd = infd;
|
||||
splitter_arg.num_workers = num_workers;
|
||||
Splitter_arg splitter_arg( courier, pp, shared_retval,
|
||||
cl_opts.ignore_trailing, cl_opts.loose_trailing, outfd < 0, nocopy,
|
||||
worker_args, worker_threads, cfile_size, infd, num_workers );
|
||||
|
||||
pthread_t splitter_thread;
|
||||
int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg );
|
||||
int errcode = pthread_create( &splitter_thread, 0, dsplitter, &splitter_arg );
|
||||
if( errcode )
|
||||
{ show_error( "Can't create splitter thread", errcode );
|
||||
delete[] worker_threads; delete[] worker_args; return 1; }
|
||||
|
|
|
@ -115,7 +115,7 @@ int pwriteblock( const int fd, const uint8_t * const buf, const int size,
|
|||
}
|
||||
|
||||
|
||||
void decompress_error( struct LZ_Decoder * const decoder,
|
||||
void decompress_error( LZ_Decoder * const decoder,
|
||||
const Pretty_print & pp,
|
||||
Shared_retval & shared_retval, const int worker_id )
|
||||
{
|
||||
|
@ -158,11 +158,16 @@ struct Worker_arg
|
|||
const Lzip_index * lzip_index;
|
||||
const Pretty_print * pp;
|
||||
Shared_retval * shared_retval;
|
||||
int worker_id;
|
||||
int num_workers;
|
||||
int infd;
|
||||
int num_workers;
|
||||
int outfd;
|
||||
int worker_id;
|
||||
bool nocopy; // avoid copying decompressed data when testing
|
||||
void assign( const Lzip_index & li, const Pretty_print & pp_,
|
||||
Shared_retval & sr, const int ifd, const int nw,
|
||||
const int ofd, const int wi, const bool nc )
|
||||
{ lzip_index = &li; pp = &pp_; shared_retval = &sr; infd = ifd;
|
||||
num_workers = nw; outfd = ofd; worker_id = wi; nocopy = nc; }
|
||||
};
|
||||
|
||||
|
||||
|
@ -243,7 +248,7 @@ extern "C" void * dworker( void * arg )
|
|||
{
|
||||
if( data_rest != 0 )
|
||||
internal_error( "final data_rest is not zero." );
|
||||
LZ_decompress_reset( decoder ); // prepare for new member
|
||||
LZ_decompress_reset( decoder ); // prepare for next member
|
||||
break;
|
||||
}
|
||||
if( rd == 0 ) break;
|
||||
|
@ -264,11 +269,11 @@ done:
|
|||
} // end namespace
|
||||
|
||||
|
||||
// start the workers and wait for them to finish.
|
||||
// start the workers and wait for them to finish
|
||||
int decompress( const unsigned long long cfile_size, int num_workers,
|
||||
const int infd, const int outfd, const Cl_options & cl_opts,
|
||||
const Pretty_print & pp, const int debug_level,
|
||||
const int in_slots, const int out_slots,
|
||||
const int in_slots, const int out_slots, const bool from_stdin,
|
||||
const bool infd_isreg, const bool one_to_one )
|
||||
{
|
||||
if( !infd_isreg )
|
||||
|
@ -284,11 +289,11 @@ int decompress( const unsigned long long cfile_size, int num_workers,
|
|||
}
|
||||
if( lzip_index.retval() != 0 ) // corrupt or invalid input file
|
||||
{
|
||||
if( lzip_index.bad_magic() )
|
||||
show_file_error( pp.name(), lzip_index.error().c_str() );
|
||||
else pp( lzip_index.error().c_str() );
|
||||
if( lzip_index.good_magic() ) pp( lzip_index.error().c_str() );
|
||||
else show_file_error( pp.name(), lzip_index.error().c_str() );
|
||||
return lzip_index.retval();
|
||||
}
|
||||
const bool multi_empty = !from_stdin && lzip_index.multi_empty();
|
||||
|
||||
if( num_workers > lzip_index.members() ) num_workers = lzip_index.members();
|
||||
|
||||
|
@ -301,8 +306,11 @@ int decompress( const unsigned long long cfile_size, int num_workers,
|
|||
if( debug_level & 2 ) std::fputs( "decompress file to stdout.\n", stderr );
|
||||
if( verbosity >= 1 ) pp();
|
||||
show_progress( 0, cfile_size, &pp ); // init
|
||||
return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots,
|
||||
lzip_index );
|
||||
const int tmp = dec_stdout( num_workers, infd, outfd, pp, debug_level,
|
||||
out_slots, lzip_index );
|
||||
if( tmp ) return tmp;
|
||||
if( multi_empty ) { show_file_error( pp.name(), empty_msg ); return 2; }
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -325,14 +333,8 @@ int decompress( const unsigned long long cfile_size, int num_workers,
|
|||
int i = 0; // number of workers started
|
||||
for( ; i < num_workers; ++i )
|
||||
{
|
||||
worker_args[i].lzip_index = &lzip_index;
|
||||
worker_args[i].pp = &pp;
|
||||
worker_args[i].shared_retval = &shared_retval;
|
||||
worker_args[i].worker_id = i;
|
||||
worker_args[i].num_workers = num_workers;
|
||||
worker_args[i].infd = infd;
|
||||
worker_args[i].outfd = outfd;
|
||||
worker_args[i].nocopy = nocopy;
|
||||
worker_args[i].assign( lzip_index, pp, shared_retval, infd, num_workers,
|
||||
outfd, i, nocopy );
|
||||
const int errcode =
|
||||
pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] );
|
||||
if( errcode )
|
||||
|
@ -359,5 +361,6 @@ int decompress( const unsigned long long cfile_size, int num_workers,
|
|||
std::fprintf( stderr,
|
||||
"workers started %8u\n", num_workers );
|
||||
|
||||
if( multi_empty ) { show_file_error( pp.name(), empty_msg ); return 2; }
|
||||
return 0;
|
||||
}
|
||||
|
|
35
doc/plzip.1
35
doc/plzip.1
|
@ -1,32 +1,33 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
|
||||
.TH PLZIP "1" "January 2024" "plzip 1.11" "User Commands"
|
||||
.TH PLZIP "1" "November 2024" "plzip 1.12-rc1" "User Commands"
|
||||
.SH NAME
|
||||
plzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
.B plzip
|
||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Plzip is a massively parallel (multi\-threaded) implementation of lzip,
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
Plzip is a massively parallel (multi\-threaded) implementation of lzip. Plzip
|
||||
uses the compression library lzlib.
|
||||
.PP
|
||||
Lzip is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov
|
||||
chain\-Algorithm' (LZMA) stream format to maximize interoperability. The
|
||||
maximum dictionary size is 512 MiB so that any lzip file can be decompressed
|
||||
on 32\-bit machines. Lzip provides accurate and robust 3\-factor integrity
|
||||
checking. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) or compress most
|
||||
files more than bzip2 (lzip \fB\-9\fR). Decompression speed is intermediate between
|
||||
gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
|
||||
perspective. Lzip has been designed, written, and tested with great care to
|
||||
replace gzip and bzip2 as the standard general\-purpose compressed format for
|
||||
Unix\-like systems.
|
||||
of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel\-Ziv\-Markov
|
||||
chain\-Algorithm) designed to achieve complete interoperability between
|
||||
implementations. The maximum dictionary size is 512 MiB so that any lzip
|
||||
file can be decompressed on 32\-bit machines. Lzip provides accurate and
|
||||
robust 3\-factor integrity checking. 'lzip \fB\-0\fR' compresses about as fast as
|
||||
gzip, while 'lzip \fB\-9\fR' compresses most files more than bzip2. Decompression
|
||||
speed is intermediate between gzip and bzip2. Lzip provides better data
|
||||
recovery capabilities than gzip and bzip2. Lzip has been designed, written,
|
||||
and tested with great care to replace gzip and bzip2 as general\-purpose
|
||||
compressed format for Unix\-like systems.
|
||||
.PP
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
to 2 percent larger compressed files). Note that the number of usable
|
||||
threads is limited by file size; on files larger than a few GB plzip can use
|
||||
hundreds of processors, but on files of only a few MB plzip is no faster
|
||||
than lzip.
|
||||
hundreds of processors, but on files smaller than 1 MiB plzip is no faster
|
||||
than lzip (even at compression level \fB\-0\fR).
|
||||
The number of threads defaults to the number of processors.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-h\fR, \fB\-\-help\fR
|
||||
|
@ -132,8 +133,8 @@ License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
|||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
There is NO WARRANTY, to the extent permitted by law.
|
||||
Using lzlib 1.14
|
||||
Using LZ_API_VERSION = 1014
|
||||
Using lzlib 1.15\-rc1
|
||||
Using LZ_API_VERSION = 1015
|
||||
.SH "SEE ALSO"
|
||||
The full documentation for
|
||||
.B plzip
|
||||
|
|
347
doc/plzip.info
347
doc/plzip.info
|
@ -11,21 +11,22 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Plzip Manual
|
||||
************
|
||||
|
||||
This manual is for Plzip (version 1.11, 21 January 2024).
|
||||
This manual is for Plzip (version 1.12-rc1, 19 November 2024).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of plzip
|
||||
* Output:: Meaning of plzip's output
|
||||
* Invoking plzip:: Command-line interface
|
||||
* Program design:: Internal structure of plzip
|
||||
* Memory requirements:: Memory required to compress and decompress
|
||||
* Minimum file sizes:: Minimum file sizes required for full speed
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Trailing data:: Extra data appended to the file
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
* Introduction:: Purpose and features of plzip
|
||||
* Output:: Meaning of plzip's output
|
||||
* Invoking plzip:: Command-line interface
|
||||
* Argument syntax:: By convention, options start with a hyphen
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Program design:: Internal structure of plzip
|
||||
* Memory requirements:: Memory required to compress and decompress
|
||||
* Minimum file sizes:: Minimum file sizes required for full speed
|
||||
* Trailing data:: Extra data appended to the file
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2009-2024 Antonio Diaz Diaz.
|
||||
|
@ -39,27 +40,27 @@ File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip,
|
||||
compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
|
||||
Plzip is a massively parallel (multi-threaded) implementation of lzip.
|
||||
Plzip uses the compression library lzlib.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format to maximize interoperability. The
|
||||
maximum dictionary size is 512 MiB so that any lzip file can be decompressed
|
||||
on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
|
||||
checking. Lzip can compress about as fast as gzip (lzip -0) or compress most
|
||||
files more than bzip2 (lzip -9). Decompression speed is intermediate between
|
||||
gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
|
||||
perspective. Lzip has been designed, written, and tested with great care to
|
||||
replace gzip and bzip2 as the standard general-purpose compressed format for
|
||||
Unix-like systems.
|
||||
one of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel-Ziv-Markov
|
||||
chain-Algorithm) designed to achieve complete interoperability between
|
||||
implementations. The maximum dictionary size is 512 MiB so that any lzip
|
||||
file can be decompressed on 32-bit machines. Lzip provides accurate and
|
||||
robust 3-factor integrity checking. 'lzip -0' compresses about as fast as
|
||||
gzip, while 'lzip -9' compresses most files more than bzip2. Decompression
|
||||
speed is intermediate between gzip and bzip2. Lzip provides better data
|
||||
recovery capabilities than gzip and bzip2. Lzip has been designed, written,
|
||||
and tested with great care to replace gzip and bzip2 as general-purpose
|
||||
compressed format for Unix-like systems.
|
||||
|
||||
Plzip can compress/decompress large files on multiprocessor machines much
|
||||
faster than lzip, at the cost of a slightly reduced compression ratio (0.4
|
||||
to 2 percent larger compressed files). Note that the number of usable
|
||||
threads is limited by file size; on files larger than a few GB plzip can use
|
||||
hundreds of processors, but on files of only a few MB plzip is no faster
|
||||
than lzip. *Note Minimum file sizes::.
|
||||
hundreds of processors, but on files smaller than 1 MiB plzip is no faster
|
||||
than lzip (even at compression level -0). *Note Minimum file sizes::.
|
||||
|
||||
For creation and manipulation of compressed tar archives tarlz can be
|
||||
more efficient than using tar and plzip because tarlz is able to keep the
|
||||
|
@ -96,9 +97,9 @@ makes it safer than compressors returning ambiguous warning values (like
|
|||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Plzip automatically uses for each file the largest dictionary size that
|
||||
does not exceed neither the file size nor the limit given. Keep in mind
|
||||
that the decompression memory requirement is affected at compression time
|
||||
by the choice of dictionary size limit. *Note Memory requirements::.
|
||||
does not exceed neither the file size nor the limit given. The dictionary
|
||||
size used for decompression is the same dictionary size used for
|
||||
compression. *Note Memory requirements::.
|
||||
|
||||
When compressing, plzip replaces every file given in the command line
|
||||
with a compressed version of itself, with the name "original_name.lz". When
|
||||
|
@ -174,7 +175,7 @@ have been compressed. Decompressed is used to refer to data which have
|
|||
undergone the process of decompression.
|
||||
|
||||
|
||||
File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top
|
||||
File: plzip.info, Node: Invoking plzip, Next: Argument syntax, Prev: Output, Up: Top
|
||||
|
||||
3 Invoking plzip
|
||||
****************
|
||||
|
@ -189,8 +190,7 @@ means standard input. It can be mixed with other FILES and is read just
|
|||
once, the first time it appears in the command line. Remember to prepend
|
||||
'./' to any file name beginning with a hyphen, or use '--'.
|
||||
|
||||
plzip supports the following options: *Note Argument syntax:
|
||||
(arg_parser)Argument syntax.
|
||||
plzip supports the following options: *Note Argument syntax::.
|
||||
|
||||
'-h'
|
||||
'--help'
|
||||
|
@ -235,7 +235,8 @@ once, the first time it appears in the command line. Remember to prepend
|
|||
status 1. If a file fails to decompress, or is a terminal, plzip exits
|
||||
immediately with error status 2 without decompressing the rest of the
|
||||
files. A terminal is considered an uncompressed file, and therefore
|
||||
invalid.
|
||||
invalid. A multimember file with one or more empty members is accepted
|
||||
if redirected to standard input.
|
||||
|
||||
'-f'
|
||||
'--force'
|
||||
|
@ -259,7 +260,8 @@ once, the first time it appears in the command line. Remember to prepend
|
|||
'-v', the dictionary size, the number of members in the file, and the
|
||||
amount of trailing data (if any) are also printed. With '-vv', the
|
||||
positions and sizes of each member in multimember files are also
|
||||
printed.
|
||||
printed. A multimember file with one or more empty members is accepted
|
||||
if redirected to standard input.
|
||||
|
||||
If any file is damaged, does not exist, can't be opened, or is not
|
||||
regular, the final exit status is > 0. '-lq' can be used to check
|
||||
|
@ -278,8 +280,8 @@ once, the first time it appears in the command line. Remember to prepend
|
|||
'-n N'
|
||||
'--threads=N'
|
||||
Set the maximum number of worker threads, overriding the system's
|
||||
default. Valid values range from 1 to "as many as your system can
|
||||
support". If this option is not used, plzip tries to detect the number
|
||||
default. Valid values range from 1 to as many as your system can
|
||||
support. If this option is not used, plzip tries to detect the number
|
||||
of processors in the system and use it as default value. When
|
||||
compressing on a 32 bit system, plzip tries to limit the memory use to
|
||||
under 2.22 GiB (4 worker threads at level -9) by reducing the number
|
||||
|
@ -338,7 +340,8 @@ once, the first time it appears in the command line. Remember to prepend
|
|||
fails the test, does not exist, can't be opened, or is a terminal,
|
||||
plzip continues testing the rest of the files. A final diagnostic is
|
||||
shown at verbosity level 1 or higher if any file fails the test when
|
||||
testing multiple files.
|
||||
testing multiple files. A multimember file with one or more empty
|
||||
members is accepted if redirected to standard input.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
|
@ -368,6 +371,7 @@ once, the first time it appears in the command line. Remember to prepend
|
|||
'-s64MiB -m273'
|
||||
|
||||
Level Dictionary size (-s) Match length limit (-m)
|
||||
------------------------------------------------------
|
||||
-0 64 KiB 16 bytes
|
||||
-1 1 MiB 5 bytes
|
||||
-2 1.5 MiB 6 bytes
|
||||
|
@ -387,7 +391,7 @@ once, the first time it appears in the command line. Remember to prepend
|
|||
When decompressing, testing, or listing, allow trailing data whose
|
||||
first bytes are so similar to the magic bytes of a lzip header that
|
||||
they can be confused with a corrupt header. Use this option if a file
|
||||
triggers a "corrupt header" error and the cause is not indeed a
|
||||
triggers a 'corrupt header' error and the cause is not indeed a
|
||||
corrupt header.
|
||||
|
||||
'--in-slots=N'
|
||||
|
@ -421,6 +425,7 @@ and may be followed by a multiplier and an optional 'B' for "byte".
|
|||
Table of SI and binary prefixes (unit multipliers):
|
||||
|
||||
Prefix Value | Prefix Value
|
||||
----------------------------------------------------------------------
|
||||
k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024)
|
||||
M megabyte (10^6) | Mi mebibyte (2^20)
|
||||
G gigabyte (10^9) | Gi gibibyte (2^30)
|
||||
|
@ -439,9 +444,131 @@ corrupt or invalid input file, 3 for an internal consistency error (e.g.,
|
|||
bug) which caused plzip to panic.
|
||||
|
||||
|
||||
File: plzip.info, Node: Program design, Next: Memory requirements, Prev: Invoking plzip, Up: Top
|
||||
File: plzip.info, Node: Argument syntax, Next: File format, Prev: Invoking plzip, Up: Top
|
||||
|
||||
4 Internal structure of plzip
|
||||
4 Syntax of command-line arguments
|
||||
**********************************
|
||||
|
||||
POSIX recommends these conventions for command-line arguments.
|
||||
|
||||
* A command-line argument is an option if it begins with a hyphen ('-').
|
||||
|
||||
* Option names are single alphanumeric characters.
|
||||
|
||||
* Certain options require an argument.
|
||||
|
||||
* An option and its argument may or may not appear as separate tokens.
|
||||
(In other words, the whitespace separating them is optional, unless the
|
||||
argument is the empty string). Thus, '-o foo' and '-ofoo' are
|
||||
equivalent.
|
||||
|
||||
* One or more options without arguments, followed by at most one option
|
||||
that takes an argument, may follow a hyphen in a single token. Thus,
|
||||
'-abc' is equivalent to '-a -b -c'.
|
||||
|
||||
* Options typically precede other non-option arguments.
|
||||
|
||||
* The argument '--' terminates all options; any following arguments are
|
||||
treated as non-option arguments, even if they begin with a hyphen.
|
||||
|
||||
* A token consisting of a single hyphen character is interpreted as an
|
||||
ordinary non-option argument. By convention, it is used to specify
|
||||
standard input, standard output, or a file named '-'.
|
||||
|
||||
GNU adds "long options" to these conventions:
|
||||
|
||||
* A long option consists of two hyphens ('--') followed by a name made
|
||||
of alphanumeric characters and hyphens. Option names are typically one
|
||||
to three words long, with hyphens to separate words. Abbreviations can
|
||||
be used for the long option names as long as the abbreviations are
|
||||
unique.
|
||||
|
||||
* A long option and its argument may or may not appear as separate
|
||||
tokens. In the latter case they must be separated by an equal sign '='.
|
||||
Thus, '--foo bar' and '--foo=bar' are equivalent.
|
||||
|
||||
The syntax of options with an optional argument is
|
||||
'-<short_option><argument>' (without whitespace), or
|
||||
'--<long_option>=<argument>'.
|
||||
|
||||
|
||||
File: plzip.info, Node: File format, Next: Program design, Prev: Argument syntax, Up: Top
|
||||
|
||||
5 File format
|
||||
*************
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
when there is no longer anything to take away.
|
||||
-- Antoine de Saint-Exupery
|
||||
|
||||
In the diagram below, a box like this:
|
||||
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
||||
represents a variable number of bytes.
|
||||
|
||||
A lzip file consists of one or more independent "members" (compressed data
|
||||
sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them. Each member can
|
||||
encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
|
||||
size of a multimember file is unlimited. Empty members (data size = 0) are
|
||||
not allowed in multimember files.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
|
||||
All multibyte values are stored in little endian order.
|
||||
|
||||
'ID string (the "magic" bytes)'
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
'VN (version number, 1 byte)'
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
'DS (coded dictionary size, 1 byte)'
|
||||
The dictionary size is calculated by taking a power of 2 (the base
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of the
|
||||
base size.
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
'LZMA stream'
|
||||
The LZMA stream, terminated by an 'End Of Stream' marker. Uses default
|
||||
values for encoder properties. *Note Stream format: (lzip)Stream
|
||||
format, for a complete description.
|
||||
|
||||
'CRC32 (4 bytes)'
|
||||
Cyclic Redundancy Check (CRC) of the original uncompressed data.
|
||||
|
||||
'Data size (8 bytes)'
|
||||
Size of the original uncompressed data.
|
||||
|
||||
'Member size (8 bytes)'
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, improves the checking of stream integrity, and
|
||||
facilitates the safe recovery of undamaged members from multimember
|
||||
files. Lzip limits the member size to 2 PiB to prevent the data size
|
||||
field from overflowing.
|
||||
|
||||
|
||||
File: plzip.info, Node: Program design, Next: Memory requirements, Prev: File format, Up: Top
|
||||
|
||||
6 Internal structure of plzip
|
||||
*****************************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and compresses as
|
||||
|
@ -456,8 +583,8 @@ because lzip usually produces single-member files, which can't be
|
|||
decompressed in parallel.
|
||||
|
||||
For each input file, a splitter thread and several worker threads are
|
||||
created, acting the main thread as muxer (multiplexer) thread. A "packet
|
||||
courier" takes care of data transfers among threads and limits the maximum
|
||||
created, acting the main thread as muxer (multiplexer) thread. A 'packet
|
||||
courier' takes care of data transfers among threads and limits the maximum
|
||||
number of data blocks (packets) being processed simultaneously.
|
||||
|
||||
The splitter reads data blocks from the input file, and distributes them
|
||||
|
@ -486,7 +613,7 @@ only limited by the number of processors available and by I/O speed.
|
|||
|
||||
File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: Program design, Up: Top
|
||||
|
||||
5 Memory required to compress and decompress
|
||||
7 Memory required to compress and decompress
|
||||
********************************************
|
||||
|
||||
The amount of memory required *per worker thread* for decompression or
|
||||
|
@ -520,6 +647,7 @@ The following table shows the memory required *per thread* for compression
|
|||
at a given level, using the default data size for each level:
|
||||
|
||||
Level Memory required
|
||||
------------------------
|
||||
-0 4.875 MiB
|
||||
-1 17.75 MiB
|
||||
-2 26.625 MiB
|
||||
|
@ -532,9 +660,9 @@ Level Memory required
|
|||
-9 568 MiB
|
||||
|
||||
|
||||
File: plzip.info, Node: Minimum file sizes, Next: File format, Prev: Memory requirements, Up: Top
|
||||
File: plzip.info, Node: Minimum file sizes, Next: Trailing data, Prev: Memory requirements, Up: Top
|
||||
|
||||
6 Minimum file sizes required for full compression speed
|
||||
8 Minimum file sizes required for full compression speed
|
||||
********************************************************
|
||||
|
||||
When compressing, plzip divides the input file into chunks and compresses
|
||||
|
@ -569,85 +697,9 @@ Level
|
|||
-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB
|
||||
|
||||
|
||||
File: plzip.info, Node: File format, Next: Trailing data, Prev: Minimum file sizes, Up: Top
|
||||
File: plzip.info, Node: Trailing data, Next: Examples, Prev: Minimum file sizes, Up: Top
|
||||
|
||||
7 File format
|
||||
*************
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
when there is no longer anything to take away.
|
||||
-- Antoine de Saint-Exupery
|
||||
|
||||
|
||||
In the diagram below, a box like this:
|
||||
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
||||
represents a variable number of bytes.
|
||||
|
||||
|
||||
A lzip file consists of one or more independent "members" (compressed
|
||||
data sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them. Each member can
|
||||
encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
|
||||
size of a multimember file is unlimited.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
|
||||
All multibyte values are stored in little endian order.
|
||||
|
||||
'ID string (the "magic" bytes)'
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
'VN (version number, 1 byte)'
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
'DS (coded dictionary size, 1 byte)'
|
||||
The dictionary size is calculated by taking a power of 2 (the base
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of the
|
||||
base size.
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
'LZMA stream'
|
||||
The LZMA stream, finished by an "End Of Stream" marker. Uses default
|
||||
values for encoder properties. *Note Stream format: (lzip)Stream
|
||||
format, for a complete description.
|
||||
|
||||
'CRC32 (4 bytes)'
|
||||
Cyclic Redundancy Check (CRC) of the original uncompressed data.
|
||||
|
||||
'Data size (8 bytes)'
|
||||
Size of the original uncompressed data.
|
||||
|
||||
'Member size (8 bytes)'
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, improves the checking of stream integrity, and
|
||||
facilitates the safe recovery of undamaged members from multimember
|
||||
files. Lzip limits the member size to 2 PiB to prevent the data size
|
||||
field from overflowing.
|
||||
|
||||
|
||||
|
||||
File: plzip.info, Node: Trailing data, Next: Examples, Prev: File format, Up: Top
|
||||
|
||||
8 Extra data appended to the file
|
||||
9 Extra data appended to the file
|
||||
*********************************
|
||||
|
||||
Sometimes extra data are found appended to a lzip file after the last
|
||||
|
@ -657,7 +709,7 @@ member. Such trailing data may be:
|
|||
example when writing to a tape. It is safe to append any amount of
|
||||
padding zero bytes to a lzip file.
|
||||
|
||||
* Useful data added by the user; an "End Of File" string (to check that
|
||||
* Useful data added by the user; an 'End Of File' string (to check that
|
||||
the file has not been truncated), a cryptographically secure hash, a
|
||||
description of file contents, etc. It is safe to append any amount of
|
||||
text to a lzip file as long as none of the first four bytes of the
|
||||
|
@ -693,8 +745,8 @@ where a file containing trailing data must be rejected, the option
|
|||
|
||||
File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top
|
||||
|
||||
9 A small tutorial with examples
|
||||
********************************
|
||||
10 A small tutorial with examples
|
||||
*********************************
|
||||
|
||||
WARNING! Even if plzip is bug-free, other causes may result in a corrupt
|
||||
compressed file (bugs in the system libraries, memory errors, etc).
|
||||
|
@ -706,38 +758,32 @@ comparing the compressed file with the original because the corruption
|
|||
happens before plzip compresses the RAM contents, resulting in a valid
|
||||
compressed file containing wrong data.
|
||||
|
||||
|
||||
Example 1: Extract all the files from archive 'foo.tar.lz'.
|
||||
|
||||
tar -xf foo.tar.lz
|
||||
or
|
||||
plzip -cd foo.tar.lz | tar -xf -
|
||||
|
||||
|
||||
Example 2: Replace a regular file with its compressed version 'file.lz' and
|
||||
show the compression ratio.
|
||||
|
||||
plzip -v file
|
||||
|
||||
|
||||
Example 3: Like example 2 but the created 'file.lz' has a block size of
|
||||
1 MiB. The compression ratio is not shown.
|
||||
|
||||
plzip -B 1MiB file
|
||||
|
||||
|
||||
Example 4: Restore a regular file from its compressed version 'file.lz'. If
|
||||
the operation is successful, 'file.lz' is removed.
|
||||
|
||||
plzip -d file.lz
|
||||
|
||||
|
||||
Example 5: Check the integrity of the compressed file 'file.lz' and show
|
||||
status.
|
||||
|
||||
plzip -tv file.lz
|
||||
|
||||
|
||||
Example 6: The right way of concatenating the decompressed output of two or
|
||||
more compressed files. *Note Trailing data::.
|
||||
|
||||
|
@ -746,19 +792,16 @@ more compressed files. *Note Trailing data::.
|
|||
Do this instead
|
||||
plzip -cd file1.lz file2.lz file3.lz
|
||||
|
||||
|
||||
Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data
|
||||
are produced.
|
||||
|
||||
plzip -cd file.lz | dd bs=1024 count=10
|
||||
|
||||
|
||||
Example 8: Decompress 'file.lz' partially from decompressed byte at offset
|
||||
10000 to decompressed byte at offset 14999 (5000 bytes are produced).
|
||||
|
||||
plzip -cd file.lz | dd bs=1000 skip=10 count=5
|
||||
|
||||
|
||||
Example 9: Compress a whole device in /dev/sdc and send the output to
|
||||
'file.lz'.
|
||||
|
||||
|
@ -769,7 +812,7 @@ Example 9: Compress a whole device in /dev/sdc and send the output to
|
|||
|
||||
File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
10 Reporting bugs
|
||||
11 Reporting bugs
|
||||
*****************
|
||||
|
||||
There are probably bugs in plzip. There are certainly errors and omissions
|
||||
|
@ -790,6 +833,7 @@ Concept index
|
|||
|