Merging upstream version 1.13.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
8cde372fe7
commit
2edb5552c9
25 changed files with 829 additions and 742 deletions
13
ChangeLog
13
ChangeLog
|
@ -1,3 +1,12 @@
|
|||
2022-01-24 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.13 released.
|
||||
* Decompression time has been reduced by 5-12% depending on the file.
|
||||
* main.c (getnum): Show option name and valid range if error.
|
||||
* Improve several descriptions in manual, '--help', and man page.
|
||||
* clzip.texi: Change GNU Texinfo category to 'Compression'.
|
||||
(Reported by Alfred M. Szmidt).
|
||||
|
||||
2021-01-04 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.12 released.
|
||||
|
@ -22,7 +31,7 @@
|
|||
* main.c (main): Check return value of close( infd ).
|
||||
* main.c: Compile on DOS with DJGPP.
|
||||
* clzip.texi: Improve descriptions of '-0..-9', '-m', and '-s'.
|
||||
* configure: Accept appending to CFLAGS, 'CFLAGS+=OPTIONS'.
|
||||
* configure: Accept appending to CFLAGS; 'CFLAGS+=OPTIONS'.
|
||||
* INSTALL: Document use of CFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
|
||||
|
||||
2018-02-06 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
@ -151,7 +160,7 @@
|
|||
* Translated to C from the C++ source of lzip 1.10.
|
||||
|
||||
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This file is a collection of facts, and thus it is not copyrightable,
|
||||
but just in case, you have unlimited permission to copy, distribute, and
|
||||
|
|
4
INSTALL
4
INSTALL
|
@ -1,7 +1,7 @@
|
|||
Requirements
|
||||
------------
|
||||
You will need a C99 compiler. (gcc 3.3.6 or newer is recommended).
|
||||
I use gcc 6.1.0 and 4.1.2, but the code should compile with any standards
|
||||
I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards
|
||||
compliant compiler.
|
||||
Gcc is available at http://gcc.gnu.org.
|
||||
|
||||
|
@ -70,7 +70,7 @@ After running 'configure', you can run 'make' and 'make install' as
|
|||
explained above.
|
||||
|
||||
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This file is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
|
|
@ -21,7 +21,7 @@ objs = carg_parser.o lzip_index.o list.o encoder_base.o encoder.o \
|
|||
all : $(progname)
|
||||
|
||||
$(progname) : $(objs)
|
||||
$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(objs)
|
||||
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(objs)
|
||||
|
||||
main.o : main.c
|
||||
$(CC) $(CPPFLAGS) $(CFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $<
|
||||
|
|
39
NEWS
39
NEWS
|
@ -1,36 +1,11 @@
|
|||
Changes in version 1.12:
|
||||
Changes in version 1.13:
|
||||
|
||||
Clzip now reports an error if a file name is empty (clzip -t "").
|
||||
Decompression time has been reduced by 5-12% depending on the file.
|
||||
|
||||
Option '-o, --output' now behaves like '-c, --stdout', but sending the
|
||||
output unconditionally to a file instead of to standard output. See the new
|
||||
description of '-o' in the manual. This change is backwards compatible only
|
||||
when (de)compressing from standard input alone. Therefore commands like:
|
||||
clzip -o foo.lz - bar < foo
|
||||
must now be split into:
|
||||
clzip -o foo.lz - < foo
|
||||
clzip bar
|
||||
or rewritten as:
|
||||
clzip - bar < foo > foo.lz
|
||||
In case of error in a numerical argument to a command line option, clzip
|
||||
now shows the name of the option and the range of valid values.
|
||||
|
||||
When using '-c' or '-o', clzip now checks whether the output is a terminal
|
||||
only once.
|
||||
Several descriptions have been improved in manual, '--help', and man page.
|
||||
|
||||
Clzip now does not even open the output file if the input file is a terminal.
|
||||
|
||||
The words 'decompressed' and 'compressed' have been replaced with the
|
||||
shorter 'out' and 'in' in the verbose output when decompressing or testing.
|
||||
|
||||
Option '--list' now reports corruption or truncation of the last header in a
|
||||
multimenber file specifically instead of showing the generic message "Last
|
||||
member in input file is truncated or corrupt."
|
||||
|
||||
The commands needed to extract files from a tar.lz archive have been
|
||||
documented in the manual, in the output of '--help', and in the man page.
|
||||
|
||||
Plzip and tarlz are mentioned in the manual as alternatives for
|
||||
multiprocessors.
|
||||
|
||||
Several fixes and improvements have been made to the manual.
|
||||
|
||||
9 new test files have been added to the testsuite.
|
||||
The texinfo category of the manual has been changed from 'Data Compression'
|
||||
to 'Compression' to match that of gzip. (Reported by Alfred M. Szmidt).
|
||||
|
|
21
README
21
README
|
@ -7,13 +7,14 @@ C++ compiler.
|
|||
|
||||
Lzip is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip (lzip -0) or
|
||||
compress most files more than bzip2 (lzip -9). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general-purpose
|
||||
compressed format for unix-like systems.
|
||||
chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
|
||||
checking to maximize interoperability and optimize safety. Lzip can compress
|
||||
about as fast as gzip (lzip -0) or compress most files more than bzip2
|
||||
(lzip -9). Decompression speed is intermediate between gzip and bzip2.
|
||||
Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip
|
||||
has been designed, written, and tested with great care to replace gzip and
|
||||
bzip2 as the standard general-purpose compressed format for unix-like
|
||||
systems.
|
||||
|
||||
For compressing/decompressing large files on multiprocessor machines plzip
|
||||
can be much faster than lzip at the cost of a slightly reduced compression
|
||||
|
@ -72,7 +73,7 @@ filename.lz becomes filename
|
|||
filename.tlz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
(De)compressing a file is much like copying or moving it; therefore clzip
|
||||
(De)compressing a file is much like copying or moving it. Therefore clzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as 'cp -p' does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
|
@ -109,7 +110,7 @@ finding coding sequences of minimum size than the one currently used by lzip
|
|||
could be developed, and the resulting sequence could also be coded using the
|
||||
LZMA coding scheme.
|
||||
|
||||
Clzip currently implements two variants of the LZMA algorithm; fast
|
||||
Clzip currently implements two variants of the LZMA algorithm: fast
|
||||
(used by option '-0') and normal (used by all other compression levels).
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
|
@ -129,7 +130,7 @@ been compressed. Decompressed is used to refer to data which have undergone
|
|||
the process of decompression.
|
||||
|
||||
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This file is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
|
110
carg_parser.c
110
carg_parser.c
|
@ -1,5 +1,5 @@
|
|||
/* Arg_parser - POSIX/GNU command line argument parser. (C version)
|
||||
Copyright (C) 2006-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2006-2022 Antonio Diaz Diaz.
|
||||
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -32,10 +32,10 @@ static void * ap_resize_buffer( void * buf, const int min_size )
|
|||
}
|
||||
|
||||
|
||||
static char push_back_record( struct Arg_parser * const ap,
|
||||
const int code, const char * const argument )
|
||||
static char push_back_record( struct Arg_parser * const ap, const int code,
|
||||
const char * const long_name,
|
||||
const char * const argument )
|
||||
{
|
||||
const int len = strlen( argument );
|
||||
struct ap_Record * p;
|
||||
void * tmp = ap_resize_buffer( ap->data,
|
||||
( ap->data_size + 1 ) * sizeof (struct ap_Record) );
|
||||
|
@ -43,11 +43,29 @@ static char push_back_record( struct Arg_parser * const ap,
|
|||
ap->data = (struct ap_Record *)tmp;
|
||||
p = &(ap->data[ap->data_size]);
|
||||
p->code = code;
|
||||
p->argument = 0;
|
||||
tmp = ap_resize_buffer( p->argument, len + 1 );
|
||||
if( !tmp ) return 0;
|
||||
p->argument = (char *)tmp;
|
||||
strncpy( p->argument, argument, len + 1 );
|
||||
if( long_name )
|
||||
{
|
||||
const int len = strlen( long_name );
|
||||
p->parsed_name = (char *)malloc( len + 2 + 1 );
|
||||
if( !p->parsed_name ) return 0;
|
||||
p->parsed_name[0] = p->parsed_name[1] = '-';
|
||||
strncpy( p->parsed_name + 2, long_name, len + 1 );
|
||||
}
|
||||
else if( code > 0 && code < 256 )
|
||||
{
|
||||
p->parsed_name = (char *)malloc( 2 + 1 );
|
||||
if( !p->parsed_name ) return 0;
|
||||
p->parsed_name[0] = '-'; p->parsed_name[1] = code; p->parsed_name[2] = 0;
|
||||
}
|
||||
else p->parsed_name = 0;
|
||||
if( argument )
|
||||
{
|
||||
const int len = strlen( argument );
|
||||
p->argument = (char *)malloc( len + 1 );
|
||||
if( !p->argument ) { free( p->parsed_name ); return 0; }
|
||||
strncpy( p->argument, argument, len + 1 );
|
||||
}
|
||||
else p->argument = 0;
|
||||
++ap->data_size;
|
||||
return 1;
|
||||
}
|
||||
|
@ -68,12 +86,14 @@ static char add_error( struct Arg_parser * const ap, const char * const msg )
|
|||
static void free_data( struct Arg_parser * const ap )
|
||||
{
|
||||
int i;
|
||||
for( i = 0; i < ap->data_size; ++i ) free( ap->data[i].argument );
|
||||
for( i = 0; i < ap->data_size; ++i )
|
||||
{ free( ap->data[i].argument ); free( ap->data[i].parsed_name ); }
|
||||
if( ap->data ) { free( ap->data ); ap->data = 0; }
|
||||
ap->data_size = 0;
|
||||
}
|
||||
|
||||
|
||||
/* Return 0 only if out of memory. */
|
||||
static char parse_long_option( struct Arg_parser * const ap,
|
||||
const char * const opt, const char * const arg,
|
||||
const struct ap_Option options[],
|
||||
|
@ -87,9 +107,10 @@ static char parse_long_option( struct Arg_parser * const ap,
|
|||
|
||||
/* Test all long options for either exact match or abbreviated matches. */
|
||||
for( i = 0; options[i].code != 0; ++i )
|
||||
if( options[i].name && strncmp( options[i].name, &opt[2], len ) == 0 )
|
||||
if( options[i].long_name &&
|
||||
strncmp( options[i].long_name, &opt[2], len ) == 0 )
|
||||
{
|
||||
if( strlen( options[i].name ) == len ) /* Exact match found */
|
||||
if( strlen( options[i].long_name ) == len ) /* Exact match found */
|
||||
{ index = i; exact = 1; break; }
|
||||
else if( index < 0 ) index = i; /* First nonexact match found */
|
||||
else if( options[index].code != options[i].code ||
|
||||
|
@ -117,35 +138,39 @@ static char parse_long_option( struct Arg_parser * const ap,
|
|||
{
|
||||
if( options[index].has_arg == ap_no )
|
||||
{
|
||||
add_error( ap, "option '--" ); add_error( ap, options[index].name );
|
||||
add_error( ap, "option '--" ); add_error( ap, options[index].long_name );
|
||||
add_error( ap, "' doesn't allow an argument" );
|
||||
return 1;
|
||||
}
|
||||
if( options[index].has_arg == ap_yes && !opt[len+3] )
|
||||
{
|
||||
add_error( ap, "option '--" ); add_error( ap, options[index].name );
|
||||
add_error( ap, "option '--" ); add_error( ap, options[index].long_name );
|
||||
add_error( ap, "' requires an argument" );
|
||||
return 1;
|
||||
}
|
||||
return push_back_record( ap, options[index].code, &opt[len+3] );
|
||||
return push_back_record( ap, options[index].code,
|
||||
options[index].long_name, &opt[len+3] );
|
||||
}
|
||||
|
||||
if( options[index].has_arg == ap_yes )
|
||||
{
|
||||
if( !arg || !arg[0] )
|
||||
{
|
||||
add_error( ap, "option '--" ); add_error( ap, options[index].name );
|
||||
add_error( ap, "option '--" ); add_error( ap, options[index].long_name );
|
||||
add_error( ap, "' requires an argument" );
|
||||
return 1;
|
||||
}
|
||||
++*argindp;
|
||||
return push_back_record( ap, options[index].code, arg );
|
||||
return push_back_record( ap, options[index].code,
|
||||
options[index].long_name, arg );
|
||||
}
|
||||
|
||||
return push_back_record( ap, options[index].code, "" );
|
||||
return push_back_record( ap, options[index].code,
|
||||
options[index].long_name, 0 );
|
||||
}
|
||||
|
||||
|
||||
/* Return 0 only if out of memory. */
|
||||
static char parse_short_option( struct Arg_parser * const ap,
|
||||
const char * const opt, const char * const arg,
|
||||
const struct ap_Option options[],
|
||||
|
@ -156,13 +181,13 @@ static char parse_short_option( struct Arg_parser * const ap,
|
|||
while( cind > 0 )
|
||||
{
|
||||
int index = -1, i;
|
||||
const unsigned char code = opt[cind];
|
||||
const unsigned char c = opt[cind];
|
||||
char code_str[2];
|
||||
code_str[0] = code; code_str[1] = 0;
|
||||
code_str[0] = c; code_str[1] = 0;
|
||||
|
||||
if( code != 0 )
|
||||
if( c != 0 )
|
||||
for( i = 0; options[i].code; ++i )
|
||||
if( code == options[i].code )
|
||||
if( c == options[i].code )
|
||||
{ index = i; break; }
|
||||
|
||||
if( index < 0 )
|
||||
|
@ -176,7 +201,7 @@ static char parse_short_option( struct Arg_parser * const ap,
|
|||
|
||||
if( options[index].has_arg != ap_no && cind > 0 && opt[cind] )
|
||||
{
|
||||
if( !push_back_record( ap, code, &opt[cind] ) ) return 0;
|
||||
if( !push_back_record( ap, c, 0, &opt[cind] ) ) return 0;
|
||||
++*argindp; cind = 0;
|
||||
}
|
||||
else if( options[index].has_arg == ap_yes )
|
||||
|
@ -188,9 +213,9 @@ static char parse_short_option( struct Arg_parser * const ap,
|
|||
return 1;
|
||||
}
|
||||
++*argindp; cind = 0;
|
||||
if( !push_back_record( ap, code, arg ) ) return 0;
|
||||
if( !push_back_record( ap, c, 0, arg ) ) return 0;
|
||||
}
|
||||
else if( !push_back_record( ap, code, "" ) ) return 0;
|
||||
else if( !push_back_record( ap, c, 0, 0 ) ) return 0;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
@ -203,7 +228,7 @@ char ap_init( struct Arg_parser * const ap,
|
|||
const char ** non_options = 0; /* skipped non-options */
|
||||
int non_options_size = 0; /* number of skipped non-options */
|
||||
int argind = 1; /* index in argv */
|
||||
int i;
|
||||
char done = 0; /* false until success */
|
||||
|
||||
ap->data = 0;
|
||||
ap->error = 0;
|
||||
|
@ -223,20 +248,20 @@ char ap_init( struct Arg_parser * const ap,
|
|||
if( ch2 == '-' )
|
||||
{
|
||||
if( !argv[argind][2] ) { ++argind; break; } /* we found "--" */
|
||||
else if( !parse_long_option( ap, opt, arg, options, &argind ) ) return 0;
|
||||
else if( !parse_long_option( ap, opt, arg, options, &argind ) ) goto out;
|
||||
}
|
||||
else if( !parse_short_option( ap, opt, arg, options, &argind ) ) return 0;
|
||||
else if( !parse_short_option( ap, opt, arg, options, &argind ) ) goto out;
|
||||
if( ap->error ) break;
|
||||
}
|
||||
else
|
||||
{
|
||||
if( in_order )
|
||||
{ if( !push_back_record( ap, 0, argv[argind++] ) ) return 0; }
|
||||
{ if( !push_back_record( ap, 0, 0, argv[argind++] ) ) goto out; }
|
||||
else
|
||||
{
|
||||
void * tmp = ap_resize_buffer( non_options,
|
||||
( non_options_size + 1 ) * sizeof *non_options );
|
||||
if( !tmp ) return 0;
|
||||
if( !tmp ) goto out;
|
||||
non_options = (const char **)tmp;
|
||||
non_options[non_options_size++] = argv[argind++];
|
||||
}
|
||||
|
@ -245,13 +270,15 @@ char ap_init( struct Arg_parser * const ap,
|
|||
if( ap->error ) free_data( ap );
|
||||
else
|
||||
{
|
||||
int i;
|
||||
for( i = 0; i < non_options_size; ++i )
|
||||
if( !push_back_record( ap, 0, non_options[i] ) ) return 0;
|
||||
if( !push_back_record( ap, 0, 0, non_options[i] ) ) goto out;
|
||||
while( argind < argc )
|
||||
if( !push_back_record( ap, 0, argv[argind++] ) ) return 0;
|
||||
if( !push_back_record( ap, 0, 0, argv[argind++] ) ) goto out;
|
||||
}
|
||||
if( non_options ) free( non_options );
|
||||
return 1;
|
||||
done = 1;
|
||||
out: if( non_options ) free( non_options );
|
||||
return done;
|
||||
}
|
||||
|
||||
|
||||
|
@ -273,13 +300,20 @@ int ap_arguments( const struct Arg_parser * const ap )
|
|||
|
||||
int ap_code( const struct Arg_parser * const ap, const int i )
|
||||
{
|
||||
if( i >= 0 && i < ap_arguments( ap ) ) return ap->data[i].code;
|
||||
else return 0;
|
||||
if( i < 0 || i >= ap_arguments( ap ) ) return 0;
|
||||
return ap->data[i].code;
|
||||
}
|
||||
|
||||
|
||||
const char * ap_parsed_name( const struct Arg_parser * const ap, const int i )
|
||||
{
|
||||
if( i < 0 || i >= ap_arguments( ap ) || !ap->data[i].parsed_name ) return "";
|
||||
return ap->data[i].parsed_name;
|
||||
}
|
||||
|
||||
|
||||
const char * ap_argument( const struct Arg_parser * const ap, const int i )
|
||||
{
|
||||
if( i >= 0 && i < ap_arguments( ap ) ) return ap->data[i].argument;
|
||||
else return "";
|
||||
if( i < 0 || i >= ap_arguments( ap ) || !ap->data[i].argument ) return "";
|
||||
return ap->data[i].argument;
|
||||
}
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Arg_parser - POSIX/GNU command line argument parser. (C version)
|
||||
Copyright (C) 2006-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2006-2022 Antonio Diaz Diaz.
|
||||
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -24,9 +24,9 @@
|
|||
message.
|
||||
|
||||
'options' is an array of 'struct ap_Option' terminated by an element
|
||||
containing a code which is zero. A null name means a short-only
|
||||
option. A code value outside the unsigned char range means a
|
||||
long-only option.
|
||||
containing a code which is zero. A null long_name means a short-only
|
||||
option. A code value outside the unsigned char range means a long-only
|
||||
option.
|
||||
|
||||
Arg_parser normally makes it appear as if all the option arguments
|
||||
were specified before all the non-option arguments for the purposes
|
||||
|
@ -50,7 +50,7 @@ enum ap_Has_arg { ap_no, ap_yes, ap_maybe };
|
|||
struct ap_Option
|
||||
{
|
||||
int code; /* Short option letter or code ( code != 0 ) */
|
||||
const char * name; /* Long option name (maybe null) */
|
||||
const char * long_name; /* Long option name (maybe null) */
|
||||
enum ap_Has_arg has_arg;
|
||||
};
|
||||
|
||||
|
@ -58,6 +58,7 @@ struct ap_Option
|
|||
struct ap_Record
|
||||
{
|
||||
int code;
|
||||
char * parsed_name;
|
||||
char * argument;
|
||||
};
|
||||
|
||||
|
@ -86,6 +87,9 @@ int ap_arguments( const struct Arg_parser * const ap );
|
|||
Else ap_argument( i ) is the option's argument (or empty). */
|
||||
int ap_code( const struct Arg_parser * const ap, const int i );
|
||||
|
||||
/* Full name of the option parsed (short or long). */
|
||||
const char * ap_parsed_name( const struct Arg_parser * const ap, const int i );
|
||||
|
||||
const char * ap_argument( const struct Arg_parser * const ap, const int i );
|
||||
|
||||
#ifdef __cplusplus
|
||||
|
|
6
configure
vendored
6
configure
vendored
|
@ -1,12 +1,12 @@
|
|||
#! /bin/sh
|
||||
# configure script for Clzip - LZMA lossless data compressor
|
||||
# Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
#
|
||||
# This configure script is free software: you have unlimited permission
|
||||
# to copy, distribute, and modify it.
|
||||
|
||||
pkgname=clzip
|
||||
pkgversion=1.12
|
||||
pkgversion=1.13
|
||||
progname=clzip
|
||||
srctrigger=doc/${pkgname}.texi
|
||||
|
||||
|
@ -167,7 +167,7 @@ echo "LDFLAGS = ${LDFLAGS}"
|
|||
rm -f Makefile
|
||||
cat > Makefile << EOF
|
||||
# Makefile for Clzip - LZMA lossless data compressor
|
||||
# Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
# This file was generated automatically by configure. Don't edit.
|
||||
#
|
||||
# This Makefile is free software: you have unlimited permission
|
||||
|
|
35
decoder.c
35
decoder.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -29,8 +29,8 @@
|
|||
#include "decoder.h"
|
||||
|
||||
|
||||
/* Returns the number of bytes really read.
|
||||
If (returned value < size) and (errno == 0), means EOF was reached.
|
||||
/* Return the number of bytes really read.
|
||||
If (value returned < size) and (errno == 0), means EOF was reached.
|
||||
*/
|
||||
int readblock( const int fd, uint8_t * const buf, const int size )
|
||||
{
|
||||
|
@ -48,8 +48,8 @@ int readblock( const int fd, uint8_t * const buf, const int size )
|
|||
}
|
||||
|
||||
|
||||
/* Returns the number of bytes really written.
|
||||
If (returned value < size), it is always an error.
|
||||
/* Return the number of bytes really written.
|
||||
If (value returned < size), it is always an error.
|
||||
*/
|
||||
int writeblock( const int fd, const uint8_t * const buf, const int size )
|
||||
{
|
||||
|
@ -105,8 +105,6 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
int size = Rd_read_data( d->rdec, trailer, Lt_size );
|
||||
const unsigned long long data_size = LZd_data_position( d );
|
||||
const unsigned long long member_size = Rd_member_position( d->rdec );
|
||||
unsigned td_crc;
|
||||
unsigned long long td_size, tm_size;
|
||||
bool error = false;
|
||||
|
||||
if( size < Lt_size )
|
||||
|
@ -121,7 +119,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
while( size < Lt_size ) trailer[size++] = 0;
|
||||
}
|
||||
|
||||
td_crc = Lt_get_data_crc( trailer );
|
||||
const unsigned td_crc = Lt_get_data_crc( trailer );
|
||||
if( td_crc != LZd_crc( d ) )
|
||||
{
|
||||
error = true;
|
||||
|
@ -132,7 +130,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
td_crc, LZd_crc( d ) );
|
||||
}
|
||||
}
|
||||
td_size = Lt_get_data_size( trailer );
|
||||
const unsigned long long td_size = Lt_get_data_size( trailer );
|
||||
if( td_size != data_size )
|
||||
{
|
||||
error = true;
|
||||
|
@ -143,7 +141,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
td_size, td_size, data_size, data_size );
|
||||
}
|
||||
}
|
||||
tm_size = Lt_get_member_size( trailer );
|
||||
const unsigned long long tm_size = Lt_get_member_size( trailer );
|
||||
if( tm_size != member_size )
|
||||
{
|
||||
error = true;
|
||||
|
@ -213,25 +211,19 @@ int LZd_decode_member( struct LZ_decoder * const d,
|
|||
Rd_load( rdec );
|
||||
while( !Rd_finished( rdec ) )
|
||||
{
|
||||
int len;
|
||||
const int pos_state = LZd_data_position( d ) & pos_state_mask;
|
||||
if( Rd_decode_bit( rdec, &bm_match[state][pos_state] ) == 0 ) /* 1st bit */
|
||||
{
|
||||
/* literal byte */
|
||||
Bit_model * const bm = bm_literal[get_lit_state(LZd_peek_prev( d ))];
|
||||
if( St_is_char( state ) )
|
||||
{
|
||||
state -= ( state < 4 ) ? state : 3;
|
||||
if( ( state = St_set_char( state ) ) < 4 )
|
||||
LZd_put_byte( d, Rd_decode_tree8( rdec, bm ) );
|
||||
}
|
||||
else
|
||||
{
|
||||
state -= ( state < 10 ) ? 3 : 6;
|
||||
LZd_put_byte( d, Rd_decode_matched( rdec, bm, LZd_peek( d, rep0 ) ) );
|
||||
}
|
||||
continue;
|
||||
}
|
||||
/* match or repeated match */
|
||||
int len;
|
||||
if( Rd_decode_bit( rdec, &bm_rep[state] ) != 0 ) /* 2nd bit */
|
||||
{
|
||||
if( Rd_decode_bit( rdec, &bm_rep0[state] ) == 0 ) /* 3rd bit */
|
||||
|
@ -257,13 +249,12 @@ int LZd_decode_member( struct LZ_decoder * const d,
|
|||
rep0 = distance;
|
||||
}
|
||||
state = St_set_rep( state );
|
||||
len = min_match_len + Rd_decode_len( rdec, &rep_len_model, pos_state );
|
||||
len = Rd_decode_len( rdec, &rep_len_model, pos_state );
|
||||
}
|
||||
else /* match */
|
||||
{
|
||||
unsigned distance;
|
||||
len = min_match_len + Rd_decode_len( rdec, &match_len_model, pos_state );
|
||||
distance = Rd_decode_tree6( rdec, bm_dis_slot[get_len_state(len)] );
|
||||
len = Rd_decode_len( rdec, &match_len_model, pos_state );
|
||||
unsigned distance = Rd_decode_tree6( rdec, bm_dis_slot[get_len_state(len)] );
|
||||
if( distance >= start_dis_model )
|
||||
{
|
||||
const unsigned dis_slot = distance;
|
||||
|
|
116
decoder.h
116
decoder.h
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -101,12 +101,11 @@ static inline unsigned Rd_decode( struct Range_decoder * const rdec,
|
|||
int i;
|
||||
for( i = num_bits; i > 0; --i )
|
||||
{
|
||||
bool bit;
|
||||
Rd_normalize( rdec );
|
||||
rdec->range >>= 1;
|
||||
/* symbol <<= 1; */
|
||||
/* if( rdec->code >= rdec->range ) { rdec->code -= rdec->range; symbol |= 1; } */
|
||||
bit = ( rdec->code >= rdec->range );
|
||||
const bool bit = ( rdec->code >= rdec->range );
|
||||
symbol <<= 1; symbol += bit;
|
||||
rdec->code -= rdec->range & ( 0U - bit );
|
||||
}
|
||||
|
@ -116,42 +115,75 @@ static inline unsigned Rd_decode( struct Range_decoder * const rdec,
|
|||
static inline unsigned Rd_decode_bit( struct Range_decoder * const rdec,
|
||||
Bit_model * const probability )
|
||||
{
|
||||
uint32_t bound;
|
||||
Rd_normalize( rdec );
|
||||
bound = ( rdec->range >> bit_model_total_bits ) * *probability;
|
||||
const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability;
|
||||
if( rdec->code < bound )
|
||||
{
|
||||
rdec->range = bound;
|
||||
*probability += (bit_model_total - *probability) >> bit_model_move_bits;
|
||||
*probability += ( bit_model_total - *probability ) >> bit_model_move_bits;
|
||||
return 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
rdec->range -= bound;
|
||||
rdec->code -= bound;
|
||||
rdec->range -= bound;
|
||||
*probability -= *probability >> bit_model_move_bits;
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
static inline unsigned Rd_decode_tree3( struct Range_decoder * const rdec,
|
||||
Bit_model bm[] )
|
||||
static inline void Rd_decode_symbol_bit( struct Range_decoder * const rdec,
|
||||
Bit_model * const probability, unsigned * symbol )
|
||||
{
|
||||
unsigned symbol = 2 | Rd_decode_bit( rdec, &bm[1] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
return symbol & 7;
|
||||
Rd_normalize( rdec );
|
||||
*symbol <<= 1;
|
||||
const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability;
|
||||
if( rdec->code < bound )
|
||||
{
|
||||
rdec->range = bound;
|
||||
*probability += ( bit_model_total - *probability ) >> bit_model_move_bits;
|
||||
}
|
||||
else
|
||||
{
|
||||
rdec->code -= bound;
|
||||
rdec->range -= bound;
|
||||
*probability -= *probability >> bit_model_move_bits;
|
||||
*symbol |= 1;
|
||||
}
|
||||
}
|
||||
|
||||
static inline void Rd_decode_symbol_bit_reversed( struct Range_decoder * const rdec,
|
||||
Bit_model * const probability, unsigned * model,
|
||||
unsigned * symbol, const int i )
|
||||
{
|
||||
Rd_normalize( rdec );
|
||||
*model <<= 1;
|
||||
const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability;
|
||||
if( rdec->code < bound )
|
||||
{
|
||||
rdec->range = bound;
|
||||
*probability += ( bit_model_total - *probability ) >> bit_model_move_bits;
|
||||
}
|
||||
else
|
||||
{
|
||||
rdec->code -= bound;
|
||||
rdec->range -= bound;
|
||||
*probability -= *probability >> bit_model_move_bits;
|
||||
*model |= 1;
|
||||
*symbol |= 1 << i;
|
||||
}
|
||||
}
|
||||
|
||||
static inline unsigned Rd_decode_tree6( struct Range_decoder * const rdec,
|
||||
Bit_model bm[] )
|
||||
{
|
||||
unsigned symbol = 2 | Rd_decode_bit( rdec, &bm[1] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
unsigned symbol = 1;
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
return symbol & 0x3F;
|
||||
}
|
||||
|
||||
|
@ -159,9 +191,14 @@ static inline unsigned Rd_decode_tree8( struct Range_decoder * const rdec,
|
|||
Bit_model bm[] )
|
||||
{
|
||||
unsigned symbol = 1;
|
||||
int i;
|
||||
for( i = 0; i < 8; ++i )
|
||||
symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
return symbol & 0xFF;
|
||||
}
|
||||
|
||||
|
@ -173,21 +210,19 @@ Rd_decode_tree_reversed( struct Range_decoder * const rdec,
|
|||
unsigned symbol = 0;
|
||||
int i;
|
||||
for( i = 0; i < num_bits; ++i )
|
||||
{
|
||||
const unsigned bit = Rd_decode_bit( rdec, &bm[model] );
|
||||
model <<= 1; model += bit;
|
||||
symbol |= ( bit << i );
|
||||
}
|
||||
Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, i );
|
||||
return symbol;
|
||||
}
|
||||
|
||||
static inline unsigned
|
||||
Rd_decode_tree_reversed4( struct Range_decoder * const rdec, Bit_model bm[] )
|
||||
{
|
||||
unsigned symbol = Rd_decode_bit( rdec, &bm[1] );
|
||||
symbol += Rd_decode_bit( rdec, &bm[2+symbol] ) << 1;
|
||||
symbol += Rd_decode_bit( rdec, &bm[4+symbol] ) << 2;
|
||||
symbol += Rd_decode_bit( rdec, &bm[8+symbol] ) << 3;
|
||||
unsigned model = 1;
|
||||
unsigned symbol = 0;
|
||||
Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 0 );
|
||||
Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 1 );
|
||||
Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 2 );
|
||||
Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 3 );
|
||||
return symbol;
|
||||
}
|
||||
|
||||
|
@ -210,11 +245,24 @@ static inline unsigned Rd_decode_len( struct Range_decoder * const rdec,
|
|||
struct Len_model * const lm,
|
||||
const int pos_state )
|
||||
{
|
||||
Bit_model * bm;
|
||||
unsigned mask, offset, symbol = 1;
|
||||
|
||||
if( Rd_decode_bit( rdec, &lm->choice1 ) == 0 )
|
||||
return Rd_decode_tree3( rdec, lm->bm_low[pos_state] );
|
||||
{ bm = lm->bm_low[pos_state]; mask = 7; offset = 0; goto len3; }
|
||||
if( Rd_decode_bit( rdec, &lm->choice2 ) == 0 )
|
||||
return len_low_symbols + Rd_decode_tree3( rdec, lm->bm_mid[pos_state] );
|
||||
return len_low_symbols + len_mid_symbols + Rd_decode_tree8( rdec, lm->bm_high );
|
||||
{ bm = lm->bm_mid[pos_state]; mask = 7; offset = len_low_symbols; goto len3; }
|
||||
bm = lm->bm_high; mask = 0xFF; offset = len_low_symbols + len_mid_symbols;
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
len3:
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol );
|
||||
return ( symbol & mask ) + min_match_len + offset;
|
||||
}
|
||||
|
||||
|
||||
|
|
21
doc/clzip.1
21
doc/clzip.1
|
@ -1,5 +1,5 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16.
|
||||
.TH CLZIP "1" "January 2021" "clzip 1.12" "User Commands"
|
||||
.TH CLZIP "1" "January 2022" "clzip 1.13" "User Commands"
|
||||
.SH NAME
|
||||
clzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
|
@ -13,13 +13,14 @@ C++ compiler.
|
|||
.PP
|
||||
Lzip is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov
|
||||
chain\-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) or
|
||||
compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general\-purpose
|
||||
compressed format for unix\-like systems.
|
||||
chain\-Algorithm' (LZMA) stream format and provides a 3 factor integrity
|
||||
checking to maximize interoperability and optimize safety. Lzip can compress
|
||||
about as fast as gzip (lzip \fB\-0\fR) or compress most files more than bzip2
|
||||
(lzip \fB\-9\fR). Decompression speed is intermediate between gzip and bzip2.
|
||||
Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip
|
||||
has been designed, written, and tested with great care to replace gzip and
|
||||
bzip2 as the standard general\-purpose compressed format for unix\-like
|
||||
systems.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-h\fR, \fB\-\-help\fR
|
||||
|
@ -102,7 +103,7 @@ To extract all the files from archive 'foo.tar.lz', use the commands
|
|||
.PP
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
invalid input file, 3 for an internal consistency error (eg, bug) which
|
||||
invalid input file, 3 for an internal consistency error (e.g., bug) which
|
||||
caused clzip to panic.
|
||||
.PP
|
||||
The ideas embodied in clzip are due to (at least) the following people:
|
||||
|
@ -115,7 +116,7 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
.br
|
||||
Clzip home page: http://www.nongnu.org/lzip/clzip.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2021 Antonio Diaz Diaz.
|
||||
Copyright \(co 2022 Antonio Diaz Diaz.
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
344
doc/clzip.info
344
doc/clzip.info
|
@ -1,6 +1,6 @@
|
|||
This is clzip.info, produced by makeinfo version 4.13+ from clzip.texi.
|
||||
|
||||
INFO-DIR-SECTION Data Compression
|
||||
INFO-DIR-SECTION Compression
|
||||
START-INFO-DIR-ENTRY
|
||||
* Clzip: (clzip). LZMA lossless data compressor
|
||||
END-INFO-DIR-ENTRY
|
||||
|
@ -11,7 +11,7 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Clzip Manual
|
||||
************
|
||||
|
||||
This manual is for Clzip (version 1.12, 4 January 2021).
|
||||
This manual is for Clzip (version 1.13, 24 January 2022).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -19,8 +19,8 @@ This manual is for Clzip (version 1.12, 4 January 2021).
|
|||
* Output:: Meaning of clzip's output
|
||||
* Invoking clzip:: Command line interface
|
||||
* Quality assurance:: Design, development, and testing of lzip
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Algorithm:: How clzip compresses the data
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Stream format:: Format of the LZMA stream in lzip files
|
||||
* Trailing data:: Extra data appended to the file
|
||||
* Examples:: A small tutorial with examples
|
||||
|
@ -29,7 +29,7 @@ This manual is for Clzip (version 1.12, 4 January 2021).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
@ -47,13 +47,14 @@ C++ compiler.
|
|||
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip (lzip -0) or
|
||||
compress most files more than bzip2 (lzip -9). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general-purpose
|
||||
compressed format for unix-like systems.
|
||||
chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
|
||||
checking to maximize interoperability and optimize safety. Lzip can compress
|
||||
about as fast as gzip (lzip -0) or compress most files more than bzip2
|
||||
(lzip -9). Decompression speed is intermediate between gzip and bzip2. Lzip
|
||||
is better than gzip and bzip2 from a data recovery perspective. Lzip has
|
||||
been designed, written, and tested with great care to replace gzip and
|
||||
bzip2 as the standard general-purpose compressed format for unix-like
|
||||
systems.
|
||||
|
||||
For compressing/decompressing large files on multiprocessor machines
|
||||
plzip can be much faster than lzip at the cost of a slightly reduced
|
||||
|
@ -91,9 +92,9 @@ byte near the beginning is a thing of the past.
|
|||
|
||||
The member trailer stores the 32-bit CRC of the original data, the size
|
||||
of the original data, and the size of the member. These values, together
|
||||
with the end-of-stream marker, provide a 3 factor integrity checking which
|
||||
guarantees that the decompressed version of the data is identical to the
|
||||
original. This guards against corruption of the compressed data, and
|
||||
with the "End Of Stream" marker, provide a 3 factor integrity checking
|
||||
which guarantees that the decompressed version of the data is identical to
|
||||
the original. This guards against corruption of the compressed data, and
|
||||
against undetected bugs in clzip (hopefully very unlikely). The chances of
|
||||
data corruption going undetected are microscopic. Be aware, though, that
|
||||
the check occurs upon decompression, so it can only tell you that something
|
||||
|
@ -124,7 +125,7 @@ filename.lz becomes filename
|
|||
filename.tlz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
(De)compressing a file is much like copying or moving it; therefore clzip
|
||||
(De)compressing a file is much like copying or moving it. Therefore clzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as 'cp -p' does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
|
@ -252,10 +253,13 @@ once, the first time it appears in the command line.
|
|||
|
||||
'-d'
|
||||
'--decompress'
|
||||
Decompress the files specified. If a file does not exist or can't be
|
||||
opened, clzip continues decompressing the rest of the files. If a file
|
||||
fails to decompress, or is a terminal, clzip exits immediately without
|
||||
decompressing the rest of the files.
|
||||
Decompress the files specified. If a file does not exist, can't be
|
||||
opened, or the destination file already exists and '--force' has not
|
||||
been specified, clzip continues decompressing the rest of the files
|
||||
and exits with error status 1. If a file fails to decompress, or is a
|
||||
terminal, clzip exits immediately with error status 2 without
|
||||
decompressing the rest of the files. A terminal is considered an
|
||||
uncompressed file, and therefore invalid.
|
||||
|
||||
'-f'
|
||||
'--force'
|
||||
|
@ -281,10 +285,12 @@ once, the first time it appears in the command line.
|
|||
positions and sizes of each member in multimember files are also
|
||||
printed.
|
||||
|
||||
'-lq' can be used to verify quickly (without decompressing) the
|
||||
structural integrity of the files specified. (Use '--test' to verify
|
||||
the data integrity). '-alq' additionally verifies that none of the
|
||||
files specified contain trailing data.
|
||||
If any file is damaged, does not exist, can't be opened, or is not
|
||||
regular, the final exit status will be > 0. '-lq' can be used to verify
|
||||
quickly (without decompressing) the structural integrity of the files
|
||||
specified. (Use '--test' to verify the data integrity). '-alq'
|
||||
additionally verifies that none of the files specified contain
|
||||
trailing data.
|
||||
|
||||
'-m BYTES'
|
||||
'--match-length=BYTES'
|
||||
|
@ -423,11 +429,11 @@ Y yottabyte (10^24) | Yi yobibyte (2^80)
|
|||
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file not
|
||||
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
|
||||
input file, 3 for an internal consistency error (eg, bug) which caused
|
||||
input file, 3 for an internal consistency error (e.g., bug) which caused
|
||||
clzip to panic.
|
||||
|
||||
|
||||
File: clzip.info, Node: Quality assurance, Next: File format, Prev: Invoking clzip, Up: Top
|
||||
File: clzip.info, Node: Quality assurance, Next: Algorithm, Prev: Invoking clzip, Up: Top
|
||||
|
||||
4 Design, development, and testing of lzip
|
||||
******************************************
|
||||
|
@ -575,12 +581,13 @@ extraction of the decompressed data.
|
|||
=============================
|
||||
|
||||
'Accurate and robust error detection'
|
||||
The lzip format provides 3 factor integrity checking and the
|
||||
decompressors report mismatches in each factor separately. This way if
|
||||
just one byte in one factor fails but the other two factors match the
|
||||
data, it probably means that the data are intact and the corruption
|
||||
just affects the mismatching factor (CRC or data size) in the check
|
||||
sequence.
|
||||
The lzip format provides 3 factor integrity checking, and the
|
||||
decompressors report mismatches in each factor separately. This method
|
||||
detects most false positives for corruption. If just one byte in one
|
||||
factor fails but the other two factors match the data, it probably
|
||||
means that the data are intact and the corruption just affects the
|
||||
mismatching factor (CRC, data size, or member size) in the member
|
||||
trailer.
|
||||
|
||||
'Multiple implementations'
|
||||
Just like the lzip format provides 3 factor protection against
|
||||
|
@ -614,82 +621,9 @@ extraction of the decompressed data.
|
|||
|
||||
|
||||
|
||||
File: clzip.info, Node: File format, Next: Algorithm, Prev: Quality assurance, Up: Top
|
||||
File: clzip.info, Node: Algorithm, Next: File format, Prev: Quality assurance, Up: Top
|
||||
|
||||
5 File format
|
||||
*************
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
when there is no longer anything to take away.
|
||||
-- Antoine de Saint-Exupery
|
||||
|
||||
|
||||
In the diagram below, a box like this:
|
||||
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
||||
represents a variable number of bytes.
|
||||
|
||||
|
||||
A lzip file consists of a series of "members" (compressed data sets).
|
||||
The members simply appear one after another in the file, with no additional
|
||||
information before, between, or after them.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
|
||||
All multibyte values are stored in little endian order.
|
||||
|
||||
'ID string (the "magic" bytes)'
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
'VN (version number, 1 byte)'
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
'DS (coded dictionary size, 1 byte)'
|
||||
The dictionary size is calculated by taking a power of 2 (the base
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of the
|
||||
base size.
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
'LZMA stream'
|
||||
The LZMA stream, finished by an end of stream marker. Uses default
|
||||
values for encoder properties. *Note Stream format::, for a complete
|
||||
description.
|
||||
|
||||
'CRC32 (4 bytes)'
|
||||
Cyclic Redundancy Check (CRC) of the uncompressed original data.
|
||||
|
||||
'Data size (8 bytes)'
|
||||
Size of the uncompressed original data.
|
||||
|
||||
'Member size (8 bytes)'
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, allows the verification of stream integrity,
|
||||
and facilitates safe recovery of undamaged members from multimember
|
||||
files.
|
||||
|
||||
|
||||
|
||||
File: clzip.info, Node: Algorithm, Next: Stream format, Prev: File format, Up: Top
|
||||
|
||||
6 Algorithm
|
||||
5 Algorithm
|
||||
***********
|
||||
|
||||
In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
|
||||
|
@ -704,7 +638,7 @@ of finding coding sequences of minimum size than the one currently used by
|
|||
clzip could be developed, and the resulting sequence could also be coded
|
||||
using the LZMA coding scheme.
|
||||
|
||||
Clzip currently implements two variants of the LZMA algorithm; fast
|
||||
Clzip currently implements two variants of the LZMA algorithm: fast
|
||||
(used by option '-0') and normal (used by all other compression levels).
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
|
@ -716,7 +650,7 @@ contexts according to what the bits are used for.
|
|||
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
|
||||
which reduces redundancy by translating chunks of data to their
|
||||
corresponding distance-length pairs. The second stage is a range encoder
|
||||
that uses a different probability model for each type of data; distances,
|
||||
that uses a different probability model for each type of data: distances,
|
||||
lengths, literal bytes, etc.
|
||||
|
||||
Here is how it works, step by step:
|
||||
|
@ -762,17 +696,90 @@ encoding), Igor Pavlov (for putting all the above together in LZMA), and
|
|||
Julian Seward (for bzip2's CLI).
|
||||
|
||||
|
||||
File: clzip.info, Node: Stream format, Next: Trailing data, Prev: Algorithm, Up: Top
|
||||
File: clzip.info, Node: File format, Next: Stream format, Prev: Algorithm, Up: Top
|
||||
|
||||
6 File format
|
||||
*************
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
when there is no longer anything to take away.
|
||||
-- Antoine de Saint-Exupery
|
||||
|
||||
|
||||
In the diagram below, a box like this:
|
||||
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
||||
represents a variable number of bytes.
|
||||
|
||||
|
||||
A lzip file consists of a series of independent "members" (compressed
|
||||
data sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them. Each member can
|
||||
encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
|
||||
size of a multimember file is unlimited.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
|
||||
All multibyte values are stored in little endian order.
|
||||
|
||||
'ID string (the "magic" bytes)'
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
'VN (version number, 1 byte)'
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
'DS (coded dictionary size, 1 byte)'
|
||||
The dictionary size is calculated by taking a power of 2 (the base
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of the
|
||||
base size.
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
'LZMA stream'
|
||||
The LZMA stream, finished by an "End Of Stream" marker. Uses default
|
||||
values for encoder properties. *Note Stream format::, for a complete
|
||||
description.
|
||||
|
||||
'CRC32 (4 bytes)'
|
||||
Cyclic Redundancy Check (CRC) of the original uncompressed data.
|
||||
|
||||
'Data size (8 bytes)'
|
||||
Size of the original uncompressed data.
|
||||
|
||||
'Member size (8 bytes)'
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, allows the verification of stream integrity,
|
||||
and facilitates the safe recovery of undamaged members from
|
||||
multimember files. Member size should be limited to 2 PiB to prevent
|
||||
the data size field from overflowing.
|
||||
|
||||
|
||||
|
||||
File: clzip.info, Node: Stream format, Next: Trailing data, Prev: File format, Up: Top
|
||||
|
||||
7 Format of the LZMA stream in lzip files
|
||||
*****************************************
|
||||
|
||||
Lzip uses a simplified form of the LZMA stream format chosen to maximize
|
||||
safety and interoperability.
|
||||
|
||||
The LZMA algorithm has three parameters, called "special LZMA
|
||||
properties", to adjust it for some kinds of binary data. These parameters
|
||||
are; 'literal_context_bits' (with a default value of 3),
|
||||
The LZMA algorithm has three parameters, called "special LZMA properties",
|
||||
to adjust it for some kinds of binary data. These parameters are:
|
||||
'literal_context_bits' (with a default value of 3),
|
||||
'literal_pos_state_bits' (with a default value of 0), and 'pos_state_bits'
|
||||
(with a default value of 2). As a general purpose compressor, lzip only
|
||||
uses the default values for these parameters. In particular
|
||||
|
@ -782,12 +789,14 @@ in the code.
|
|||
Lzip finishes the LZMA stream with an "End Of Stream" (EOS) marker (the
|
||||
distance-length pair 0xFFFFFFFFU, 2), which in conjunction with the 'member
|
||||
size' field in the member trailer allows the verification of stream
|
||||
integrity. The LZMA stream in lzip files always has these two features
|
||||
(default properties and EOS marker) and is referred to in this document as
|
||||
LZMA-302eos. The EOS marker is the only marker allowed in lzip files.
|
||||
integrity. The EOS marker is the only marker allowed in lzip files. The
|
||||
LZMA stream in lzip files always has these two features (default properties
|
||||
and EOS marker) and is referred to in this document as LZMA-302eos. This
|
||||
simplified form of the LZMA stream format has been chosen to maximize
|
||||
interoperability and safety.
|
||||
|
||||
The second stage of LZMA is a range encoder that uses a different
|
||||
probability model for each type of symbol; distances, lengths, literal
|
||||
probability model for each type of symbol: distances, lengths, literal
|
||||
bytes, etc. Range encoding conceptually encodes all the symbols of the
|
||||
message into one number. Unlike Huffman coding, which assigns to each
|
||||
symbol a bit-pattern and concatenates all the bit-patterns together, range
|
||||
|
@ -795,16 +804,16 @@ encoding can compress one symbol to less than one bit. Therefore the
|
|||
compressed data produced by a range encoder can't be split in pieces that
|
||||
could be described individually.
|
||||
|
||||
It seems that the only way of describing the LZMA-302eos stream is
|
||||
describing the algorithm that decodes it. And given the many details about
|
||||
It seems that the only way of describing the LZMA-302eos stream is to
|
||||
describe the algorithm that decodes it. And given the many details about
|
||||
the range decoder that need to be described accurately, the source code of
|
||||
a real decoder seems the only appropriate reference to use.
|
||||
a real decompressor seems the only appropriate reference to use.
|
||||
|
||||
What follows is a description of the decoding algorithm for LZMA-302eos
|
||||
streams using as reference the source code of "lzd", an educational
|
||||
decompressor for lzip files which can be downloaded from the lzip download
|
||||
directory. The source code of lzd is included in appendix A. *Note
|
||||
Reference source code::.
|
||||
directory. Lzd is written in C++11 and its source code is included in
|
||||
appendix A. *Note Reference source code::.
|
||||
|
||||
|
||||
7.1 What is coded
|
||||
|
@ -840,7 +849,7 @@ Bit sequence Description
|
|||
1 + 1 + 8 bits lengths from 18 to 273
|
||||
|
||||
|
||||
The coding of distances is a little more complicated, so I'll begin
|
||||
The coding of distances is a little more complicated, so I'll begin by
|
||||
explaining a simpler version of the encoding.
|
||||
|
||||
Imagine you need to encode a number from 0 to 2^32 - 1, and you want to
|
||||
|
@ -850,7 +859,7 @@ which you may find by making a bit scan from the left (from the MSB). A
|
|||
position of 0 means that the number is 0 (no bit is set), 1 means the LSB is
|
||||
the first bit set (the number is 1), and 32 means the MSB is set (i.e., the
|
||||
number is >= 0x80000000). Then, if the position is >= 2, you encode the
|
||||
remaining position - 1 bits. Let's call these bits "direct_bits" because
|
||||
remaining position - 1 bits. Let's call these bits "direct bits" because
|
||||
they are coded directly by value instead of indirectly by position.
|
||||
|
||||
The inconvenient of this simple method is that it needs 6 bits to encode
|
||||
|
@ -906,9 +915,10 @@ integers representing the probability of the corresponding bit being 0.
|
|||
of 3. The resulting value is in the range 0 to 3.
|
||||
|
||||
|
||||
In the following table, '!literal' is any sequence except a literal
|
||||
byte. 'rep' is any one of 'rep0', 'rep1', 'rep2', or 'rep3'. The types of
|
||||
previous sequences corresponding to each state are:
|
||||
The types of previous sequences corresponding to each state are shown in
|
||||
the following table. '!literal' is any sequence except a literal byte.
|
||||
'rep' is any one of 'rep0', 'rep1', 'rep2', or 'rep3'. The last type in
|
||||
each line is the most recent.
|
||||
|
||||
State Types of previous sequences
|
||||
------------------------------------------------------
|
||||
|
@ -979,9 +989,9 @@ The LZMA stream is consumed one byte at a time by the range decoder. (See
|
|||
of decoded bits, depending on how well these bits agree with their context.
|
||||
(See 'decode_bit' in the source).
|
||||
|
||||
The range decoder state consists of two unsigned 32-bit variables;
|
||||
The range decoder state consists of two unsigned 32-bit variables:
|
||||
'range' (representing the most significant part of the range size not yet
|
||||
decoded), and 'code' (representing the current point within 'range').
|
||||
decoded) and 'code' (representing the current point within 'range').
|
||||
'range' is initialized to 2^32 - 1, and 'code' is initialized to 0.
|
||||
|
||||
The range encoder produces a first 0 byte that must be ignored by the
|
||||
|
@ -993,7 +1003,7 @@ range decoder. This is done by shifting 5 bytes in the initialization of
|
|||
==========================================
|
||||
|
||||
After decoding the member header and obtaining the dictionary size, the
|
||||
range decoder is initialized and then the LZMA decoder enters a loop (See
|
||||
range decoder is initialized and then the LZMA decoder enters a loop (see
|
||||
'decode_member' in the source) where it invokes the range decoder with the
|
||||
appropriate contexts to decode the different coding sequences (matches,
|
||||
repeated matches, and literal bytes), until the "End Of Stream" marker is
|
||||
|
@ -1001,8 +1011,8 @@ decoded.
|
|||
|
||||
Once the "End Of Stream" marker has been decoded, the decompressor reads
|
||||
and decodes the member trailer, and verifies that the three integrity
|
||||
factors (CRC, data size, and member size) match those calculated by the
|
||||
LZMA decoder.
|
||||
factors stored there (CRC, data size, and member size) match those computed
|
||||
from the data.
|
||||
|
||||
|
||||
File: clzip.info, Node: Trailing data, Next: Examples, Prev: Stream format, Up: Top
|
||||
|
@ -1079,7 +1089,7 @@ show the compression ratio.
|
|||
clzip -v file
|
||||
|
||||
|
||||
Example 3: Like example 1 but the created 'file.lz' is multimember with a
|
||||
Example 3: Like example 2 but the created 'file.lz' is multimember with a
|
||||
member size of 1 MiB. The compression ratio is not shown.
|
||||
|
||||
clzip -b 1MiB file
|
||||
|
@ -1097,15 +1107,7 @@ status.
|
|||
clzip -tv file.lz
|
||||
|
||||
|
||||
Example 6: Compress a whole device in /dev/sdc and send the output to
|
||||
'file.lz'.
|
||||
|
||||
clzip -c /dev/sdc > file.lz
|
||||
or
|
||||
clzip /dev/sdc -o file.lz
|
||||
|
||||
|
||||
Example 7: The right way of concatenating the decompressed output of two or
|
||||
Example 6: The right way of concatenating the decompressed output of two or
|
||||
more compressed files. *Note Trailing data::.
|
||||
|
||||
Don't do this
|
||||
|
@ -1114,18 +1116,26 @@ more compressed files. *Note Trailing data::.
|
|||
clzip -cd file1.lz file2.lz file3.lz
|
||||
|
||||
|
||||
Example 8: Decompress 'file.lz' partially until 10 KiB of decompressed data
|
||||
Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data
|
||||
are produced.
|
||||
|
||||
clzip -cd file.lz | dd bs=1024 count=10
|
||||
|
||||
|
||||
Example 9: Decompress 'file.lz' partially from decompressed byte at offset
|
||||
Example 8: Decompress 'file.lz' partially from decompressed byte at offset
|
||||
10000 to decompressed byte at offset 14999 (5000 bytes are produced).
|
||||
|
||||
clzip -cd file.lz | dd bs=1000 skip=10 count=5
|
||||
|
||||
|
||||
Example 9: Compress a whole device in /dev/sdc and send the output to
|
||||
'file.lz'.
|
||||
|
||||
clzip -c /dev/sdc > file.lz
|
||||
or
|
||||
clzip /dev/sdc -o file.lz
|
||||
|
||||
|
||||
Example 10: Create a multivolume compressed tar archive with a volume size
|
||||
of 1440 KiB.
|
||||
|
||||
|
@ -1165,7 +1175,7 @@ Appendix A Reference source code
|
|||
********************************
|
||||
|
||||
/* Lzd - Educational decompressor for the lzip format
|
||||
Copyright (C) 2013-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -1195,7 +1205,7 @@ Appendix A Reference source code
|
|||
#include <cstring>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__
|
||||
#include <fcntl.h>
|
||||
#include <io.h>
|
||||
#endif
|
||||
|
@ -1585,7 +1595,7 @@ int main( const int argc, const char * const argv[] )
|
|||
"See the lzip manual for an explanation of the code.\n"
|
||||
"\nUsage: %s [-d] < file.lz > file\n"
|
||||
"Lzd decompresses from standard input to standard output.\n"
|
||||
"\nCopyright (C) 2021 Antonio Diaz Diaz.\n"
|
||||
"\nCopyright (C) 2022 Antonio Diaz Diaz.\n"
|
||||
"License 2-clause BSD.\n"
|
||||
"This is free software: you are free to change and redistribute it.\n"
|
||||
"There is NO WARRANTY, to the extent permitted by law.\n"
|
||||
|
@ -1595,7 +1605,7 @@ int main( const int argc, const char * const argv[] )
|
|||
return 0;
|
||||
}
|
||||
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__
|
||||
setmode( STDIN_FILENO, O_BINARY );
|
||||
setmode( STDOUT_FILENO, O_BINARY );
|
||||
#endif
|
||||
|
@ -1677,23 +1687,23 @@ Concept index
|
|||
|
||||
|
||||
Tag Table:
|
||||
Node: Top210
|
||||
Node: Introduction1211
|
||||
Node: Output7184
|
||||
Node: Invoking clzip8787
|
||||
Ref: --trailing-error9585
|
||||
Node: Quality assurance18586
|
||||
Node: File format27545
|
||||
Ref: coded-dict-size28836
|
||||
Node: Algorithm29972
|
||||
Node: Stream format33379
|
||||
Ref: what-is-coded35749
|
||||
Node: Trailing data44618
|
||||
Node: Examples46881
|
||||
Ref: concat-example48493
|
||||
Node: Problems49563
|
||||
Node: Reference source code50099
|
||||
Node: Concept index64964
|
||||
Node: Top205
|
||||
Node: Introduction1207
|
||||
Node: Output7226
|
||||
Node: Invoking clzip8829
|
||||
Ref: --trailing-error9627
|
||||
Node: Quality assurance18961
|
||||
Node: Algorithm27986
|
||||
Node: File format31397
|
||||
Ref: coded-dict-size32827
|
||||
Node: Stream format34062
|
||||
Ref: what-is-coded36459
|
||||
Node: Trailing data45387
|
||||
Node: Examples47650
|
||||
Ref: concat-example49102
|
||||
Node: Problems50332
|
||||
Node: Reference source code50868
|
||||
Node: Concept index65727
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
326
doc/clzip.texi
326
doc/clzip.texi
|
@ -6,10 +6,10 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 4 January 2021
|
||||
@set VERSION 1.12
|
||||
@set UPDATED 24 January 2022
|
||||
@set VERSION 1.13
|
||||
|
||||
@dircategory Data Compression
|
||||
@dircategory Compression
|
||||
@direntry
|
||||
* Clzip: (clzip). LZMA lossless data compressor
|
||||
@end direntry
|
||||
|
@ -40,8 +40,8 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
|
|||
* Output:: Meaning of clzip's output
|
||||
* Invoking clzip:: Command line interface
|
||||
* Quality assurance:: Design, development, and testing of lzip
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Algorithm:: How clzip compresses the data
|
||||
* File format:: Detailed format of the compressed file
|
||||
* Stream format:: Format of the LZMA stream in lzip files
|
||||
* Trailing data:: Extra data appended to the file
|
||||
* Examples:: A small tutorial with examples
|
||||
|
@ -51,7 +51,7 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
@ -71,13 +71,14 @@ C++ compiler.
|
|||
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip}
|
||||
is a lossless data compressor with a user interface similar to the one
|
||||
of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
|
||||
chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
|
||||
interoperability. Lzip can compress about as fast as gzip @w{(lzip -0)} or
|
||||
compress most files more than bzip2 @w{(lzip -9)}. Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
|
||||
a data recovery perspective. Lzip has been designed, written, and tested
|
||||
with great care to replace gzip and bzip2 as the standard general-purpose
|
||||
compressed format for unix-like systems.
|
||||
chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity
|
||||
checking to maximize interoperability and optimize safety. Lzip can compress
|
||||
about as fast as gzip @w{(lzip -0)} or compress most files more than bzip2
|
||||
@w{(lzip -9)}. Decompression speed is intermediate between gzip and bzip2.
|
||||
Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip
|
||||
has been designed, written, and tested with great care to replace gzip and
|
||||
bzip2 as the standard general-purpose compressed format for unix-like
|
||||
systems.
|
||||
|
||||
For compressing/decompressing large files on multiprocessor machines
|
||||
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html,,plzip} can be
|
||||
|
@ -87,8 +88,8 @@ much faster than lzip at the cost of a slightly reduced compression ratio.
|
|||
@end ifnothtml
|
||||
|
||||
For creation and manipulation of compressed tar archives
|
||||
@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be
|
||||
more efficient than using tar and plzip because tarlz is able to keep the
|
||||
@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be more
|
||||
efficient than using tar and plzip because tarlz is able to keep the
|
||||
alignment between tar members and lzip members.
|
||||
@ifnothtml
|
||||
@xref{Top,tarlz manual,,tarlz}.
|
||||
|
@ -129,7 +130,7 @@ the beginning is a thing of the past.
|
|||
|
||||
The member trailer stores the 32-bit CRC of the original data, the size
|
||||
of the original data, and the size of the member. These values, together
|
||||
with the end-of-stream marker, provide a 3 factor integrity checking
|
||||
with the "End Of Stream" marker, provide a 3 factor integrity checking
|
||||
which guarantees that the decompressed version of the data is identical
|
||||
to the original. This guards against corruption of the compressed data,
|
||||
and against undetected bugs in clzip (hopefully very unlikely). The
|
||||
|
@ -165,9 +166,9 @@ file from that of the compressed file as follows:
|
|||
@item anyothername @tab becomes @tab anyothername.out
|
||||
@end multitable
|
||||
|
||||
(De)compressing a file is much like copying or moving it; therefore clzip
|
||||
(De)compressing a file is much like copying or moving it. Therefore clzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
|
||||
possible, ownership of the file just as @w{@samp{cp -p}} does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
|
||||
|
@ -305,10 +306,12 @@ and @samp{-S}. @samp{-c} has no effect when testing or listing.
|
|||
|
||||
@item -d
|
||||
@itemx --decompress
|
||||
Decompress the files specified. If a file does not exist or can't be
|
||||
opened, clzip continues decompressing the rest of the files. If a file
|
||||
fails to decompress, or is a terminal, clzip exits immediately without
|
||||
decompressing the rest of the files.
|
||||
Decompress the files specified. If a file does not exist, can't be opened,
|
||||
or the destination file already exists and @samp{--force} has not been
|
||||
specified, clzip continues decompressing the rest of the files and exits with
|
||||
error status 1. If a file fails to decompress, or is a terminal, clzip exits
|
||||
immediately with error status 2 without decompressing the rest of the files.
|
||||
A terminal is considered an uncompressed file, and therefore invalid.
|
||||
|
||||
@item -f
|
||||
@itemx --force
|
||||
|
@ -333,10 +336,11 @@ size, the number of members in the file, and the amount of trailing data (if
|
|||
any) are also printed. With @samp{-vv}, the positions and sizes of each
|
||||
member in multimember files are also printed.
|
||||
|
||||
@samp{-lq} can be used to verify quickly (without decompressing) the
|
||||
structural integrity of the files specified. (Use @samp{--test} to verify
|
||||
the data integrity). @samp{-alq} additionally verifies that none of the
|
||||
files specified contain trailing data.
|
||||
If any file is damaged, does not exist, can't be opened, or is not regular,
|
||||
the final exit status will be @w{> 0}. @samp{-lq} can be used to verify
|
||||
quickly (without decompressing) the structural integrity of the files
|
||||
specified. (Use @samp{--test} to verify the data integrity). @samp{-alq}
|
||||
additionally verifies that none of the files specified contain trailing data.
|
||||
|
||||
@item -m @var{bytes}
|
||||
@itemx --match-length=@var{bytes}
|
||||
|
@ -479,9 +483,9 @@ Table of SI and binary prefixes (unit multipliers):
|
|||
|
||||
@sp 1
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file not
|
||||
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
invalid input file, 3 for an internal consistency error (eg, bug) which
|
||||
caused clzip to panic.
|
||||
found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid
|
||||
input file, 3 for an internal consistency error (e.g., bug) which caused
|
||||
clzip to panic.
|
||||
|
||||
|
||||
@node Quality assurance
|
||||
|
@ -635,11 +639,12 @@ and may limit the number of members or the total uncompressed size.
|
|||
@table @samp
|
||||
@item Accurate and robust error detection
|
||||
|
||||
The lzip format provides 3 factor integrity checking and the decompressors
|
||||
report mismatches in each factor separately. This way if just one byte in
|
||||
one factor fails but the other two factors match the data, it probably means
|
||||
that the data are intact and the corruption just affects the mismatching
|
||||
factor (CRC or data size) in the check sequence.
|
||||
The lzip format provides 3 factor integrity checking, and the decompressors
|
||||
report mismatches in each factor separately. This method detects most false
|
||||
positives for corruption. If just one byte in one factor fails but the other
|
||||
two factors match the data, it probably means that the data are intact and
|
||||
the corruption just affects the mismatching factor (CRC, data size, or
|
||||
member size) in the member trailer.
|
||||
|
||||
@item Multiple implementations
|
||||
|
||||
|
@ -678,84 +683,6 @@ into the design of gzip. Both bzip2 and lzip are free from this flaw.
|
|||
@end table
|
||||
|
||||
|
||||
@node File format
|
||||
@chapter File format
|
||||
@cindex file format
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
when there is no longer anything to take away.@*
|
||||
--- Antoine de Saint-Exupery
|
||||
|
||||
@sp 1
|
||||
In the diagram below, a box like this:
|
||||
|
||||
@verbatim
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
@end verbatim
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
@verbatim
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
@end verbatim
|
||||
|
||||
represents a variable number of bytes.
|
||||
|
||||
@sp 1
|
||||
A lzip file consists of a series of "members" (compressed data sets).
|
||||
The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
@verbatim
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
@end verbatim
|
||||
|
||||
All multibyte values are stored in little endian order.
|
||||
|
||||
@table @samp
|
||||
@item ID string (the "magic" bytes)
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
@item VN (version number, 1 byte)
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
@anchor{coded-dict-size}
|
||||
@item DS (coded dictionary size, 1 byte)
|
||||
The dictionary size is calculated by taking a power of 2 (the base size)
|
||||
and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.@*
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
@item LZMA stream
|
||||
The LZMA stream, finished by an end of stream marker. Uses default values
|
||||
for encoder properties. @xref{Stream format}, for a complete description.
|
||||
|
||||
@item CRC32 (4 bytes)
|
||||
Cyclic Redundancy Check (CRC) of the uncompressed original data.
|
||||
|
||||
@item Data size (8 bytes)
|
||||
Size of the uncompressed original data.
|
||||
|
||||
@item Member size (8 bytes)
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, allows the verification of stream integrity, and
|
||||
facilitates safe recovery of undamaged members from multimember files.
|
||||
|
||||
@end table
|
||||
|
||||
|
||||
@node Algorithm
|
||||
@chapter Algorithm
|
||||
@cindex algorithm
|
||||
|
@ -772,7 +699,7 @@ of finding coding sequences of minimum size than the one currently used by
|
|||
clzip could be developed, and the resulting sequence could also be coded
|
||||
using the LZMA coding scheme.
|
||||
|
||||
Clzip currently implements two variants of the LZMA algorithm; fast
|
||||
Clzip currently implements two variants of the LZMA algorithm: fast
|
||||
(used by option @samp{-0}) and normal (used by all other compression levels).
|
||||
|
||||
The high compression of LZMA comes from combining two basic, well-proven
|
||||
|
@ -784,7 +711,7 @@ contexts according to what the bits are used for.
|
|||
Clzip is a two stage compressor. The first stage is a Lempel-Ziv coder,
|
||||
which reduces redundancy by translating chunks of data to their
|
||||
corresponding distance-length pairs. The second stage is a range encoder
|
||||
that uses a different probability model for each type of data;
|
||||
that uses a different probability model for each type of data:
|
||||
distances, lengths, literal bytes, etc.
|
||||
|
||||
Here is how it works, step by step:
|
||||
|
@ -831,32 +758,112 @@ encoding), Igor Pavlov (for putting all the above together in LZMA), and
|
|||
Julian Seward (for bzip2's CLI).
|
||||
|
||||
|
||||
@node File format
|
||||
@chapter File format
|
||||
@cindex file format
|
||||
|
||||
Perfection is reached, not when there is no longer anything to add, but
|
||||
when there is no longer anything to take away.@*
|
||||
--- Antoine de Saint-Exupery
|
||||
|
||||
@sp 1
|
||||
In the diagram below, a box like this:
|
||||
|
||||
@verbatim
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
@end verbatim
|
||||
|
||||
represents one byte; a box like this:
|
||||
|
||||
@verbatim
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
@end verbatim
|
||||
|
||||
represents a variable number of bytes.
|
||||
|
||||
@sp 1
|
||||
A lzip file consists of a series of independent "members" (compressed data
|
||||
sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them. Each member can
|
||||
encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data.
|
||||
The size of a multimember file is unlimited.
|
||||
|
||||
Each member has the following structure:
|
||||
|
||||
@verbatim
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
|
||||
+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
@end verbatim
|
||||
|
||||
All multibyte values are stored in little endian order.
|
||||
|
||||
@table @samp
|
||||
@item ID string (the "magic" bytes)
|
||||
A four byte string, identifying the lzip format, with the value "LZIP"
|
||||
(0x4C, 0x5A, 0x49, 0x50).
|
||||
|
||||
@item VN (version number, 1 byte)
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
@anchor{coded-dict-size}
|
||||
@item DS (coded dictionary size, 1 byte)
|
||||
The dictionary size is calculated by taking a power of 2 (the base size)
|
||||
and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.@*
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
@item LZMA stream
|
||||
The LZMA stream, finished by an "End Of Stream" marker. Uses default values
|
||||
for encoder properties. @xref{Stream format}, for a complete description.
|
||||
|
||||
@item CRC32 (4 bytes)
|
||||
Cyclic Redundancy Check (CRC) of the original uncompressed data.
|
||||
|
||||
@item Data size (8 bytes)
|
||||
Size of the original uncompressed data.
|
||||
|
||||
@item Member size (8 bytes)
|
||||
Total size of the member, including header and trailer. This field acts
|
||||
as a distributed index, allows the verification of stream integrity, and
|
||||
facilitates the safe recovery of undamaged members from multimember files.
|
||||
Member size should be limited to @w{2 PiB} to prevent the data size field
|
||||
from overflowing.
|
||||
|
||||
@end table
|
||||
|
||||
|
||||
@node Stream format
|
||||
@chapter Format of the LZMA stream in lzip files
|
||||
@cindex format of the LZMA stream
|
||||
|
||||
Lzip uses a simplified form of the LZMA stream format chosen to maximize
|
||||
safety and interoperability.
|
||||
|
||||
The LZMA algorithm has three parameters, called "special LZMA
|
||||
properties", to adjust it for some kinds of binary data. These
|
||||
parameters are; @samp{literal_context_bits} (with a default value of 3),
|
||||
parameters are: @samp{literal_context_bits} (with a default value of 3),
|
||||
@samp{literal_pos_state_bits} (with a default value of 0), and
|
||||
@samp{pos_state_bits} (with a default value of 2). As a general purpose
|
||||
compressor, lzip only uses the default values for these parameters. In
|
||||
particular @samp{literal_pos_state_bits} has been optimized away and
|
||||
does not even appear in the code.
|
||||
|
||||
Lzip finishes the LZMA stream with an "End Of Stream" (EOS) marker
|
||||
(the distance-length pair 0xFFFFFFFFU, 2), which in conjunction with the
|
||||
Lzip finishes the LZMA stream with an "End Of Stream" (EOS) marker (the
|
||||
distance-length pair @w{0xFFFFFFFFU, 2}), which in conjunction with the
|
||||
@samp{member size} field in the member trailer allows the verification of
|
||||
stream integrity. The LZMA stream in lzip files always has these two
|
||||
features (default properties and EOS marker) and is referred to in this
|
||||
document as LZMA-302eos. The EOS marker is the only marker allowed in
|
||||
lzip files.
|
||||
stream integrity. The EOS marker is the only marker allowed in lzip files.
|
||||
The LZMA stream in lzip files always has these two features (default
|
||||
properties and EOS marker) and is referred to in this document as
|
||||
LZMA-302eos. This simplified form of the LZMA stream format has been chosen
|
||||
to maximize interoperability and safety.
|
||||
|
||||
The second stage of LZMA is a range encoder that uses a different
|
||||
probability model for each type of symbol; distances, lengths, literal
|
||||
probability model for each type of symbol: distances, lengths, literal
|
||||
bytes, etc. Range encoding conceptually encodes all the symbols of the
|
||||
message into one number. Unlike Huffman coding, which assigns to each
|
||||
symbol a bit-pattern and concatenates all the bit-patterns together,
|
||||
|
@ -864,16 +871,16 @@ range encoding can compress one symbol to less than one bit. Therefore
|
|||
the compressed data produced by a range encoder can't be split in pieces
|
||||
that could be described individually.
|
||||
|
||||
It seems that the only way of describing the LZMA-302eos stream is
|
||||
describing the algorithm that decodes it. And given the many details
|
||||
It seems that the only way of describing the LZMA-302eos stream is to
|
||||
describe the algorithm that decodes it. And given the many details
|
||||
about the range decoder that need to be described accurately, the source
|
||||
code of a real decoder seems the only appropriate reference to use.
|
||||
code of a real decompressor seems the only appropriate reference to use.
|
||||
|
||||
What follows is a description of the decoding algorithm for LZMA-302eos
|
||||
streams using as reference the source code of "lzd", an educational
|
||||
decompressor for lzip files which can be downloaded from the lzip
|
||||
download directory. The source code of lzd is included in appendix A.
|
||||
@xref{Reference source code}.
|
||||
decompressor for lzip files which can be downloaded from the lzip download
|
||||
directory. Lzd is written in C++11 and its source code is included in
|
||||
appendix A. @xref{Reference source code}.
|
||||
|
||||
@sp 1
|
||||
@section What is coded
|
||||
|
@ -911,7 +918,7 @@ Lengths (the @samp{len} in the table above) are coded as follows:
|
|||
@end multitable
|
||||
|
||||
@sp 1
|
||||
The coding of distances is a little more complicated, so I'll begin
|
||||
The coding of distances is a little more complicated, so I'll begin by
|
||||
explaining a simpler version of the encoding.
|
||||
|
||||
Imagine you need to encode a number from 0 to @w{2^32 - 1}, and you want to
|
||||
|
@ -921,7 +928,7 @@ which you may find by making a bit scan from the left (from the MSB). A
|
|||
position of 0 means that the number is 0 (no bit is set), 1 means the LSB is
|
||||
the first bit set (the number is 1), and 32 means the MSB is set (i.e., the
|
||||
number is @w{>= 0x80000000}). Then, if the position is @w{>= 2}, you encode
|
||||
the remaining @w{position - 1} bits. Let's call these bits "direct_bits"
|
||||
the remaining @w{position - 1} bits. Let's call these bits "direct bits"
|
||||
because they are coded directly by value instead of indirectly by position.
|
||||
|
||||
The inconvenient of this simple method is that it needs 6 bits to encode the
|
||||
|
@ -981,10 +988,10 @@ of 3. The resulting value is in the range 0 to 3.
|
|||
@end table
|
||||
|
||||
|
||||
In the following table, @samp{!literal} is any sequence except a literal
|
||||
byte. @samp{rep} is any one of @samp{rep0}, @samp{rep1}, @samp{rep2}, or
|
||||
@samp{rep3}. The types of previous sequences corresponding to each state
|
||||
are:
|
||||
The types of previous sequences corresponding to each state are shown in the
|
||||
following table. @samp{!literal} is any sequence except a literal byte.
|
||||
@samp{rep} is any one of @samp{rep0}, @samp{rep1}, @samp{rep2}, or
|
||||
@samp{rep3}. The last type in each line is the most recent.
|
||||
|
||||
@multitable {State} {rep or (!literal, shortrep), literal, literal}
|
||||
@headitem State @tab Types of previous sequences
|
||||
|
@ -1059,9 +1066,9 @@ The LZMA stream is consumed one byte at a time by the range decoder.
|
|||
variable number of decoded bits, depending on how well these bits agree
|
||||
with their context. (See @samp{decode_bit} in the source).
|
||||
|
||||
The range decoder state consists of two unsigned 32-bit variables;
|
||||
The range decoder state consists of two unsigned 32-bit variables:
|
||||
@samp{range} (representing the most significant part of the range size
|
||||
not yet decoded), and @samp{code} (representing the current point within
|
||||
not yet decoded) and @samp{code} (representing the current point within
|
||||
@samp{range}). @samp{range} is initialized to @w{2^32 - 1}, and
|
||||
@samp{code} is initialized to 0.
|
||||
|
||||
|
@ -1075,14 +1082,15 @@ the source).
|
|||
|
||||
After decoding the member header and obtaining the dictionary size, the
|
||||
range decoder is initialized and then the LZMA decoder enters a loop
|
||||
(See @samp{decode_member} in the source) where it invokes the range
|
||||
(see @samp{decode_member} in the source) where it invokes the range
|
||||
decoder with the appropriate contexts to decode the different coding
|
||||
sequences (matches, repeated matches, and literal bytes), until the "End
|
||||
Of Stream" marker is decoded.
|
||||
|
||||
Once the "End Of Stream" marker has been decoded, the decompressor reads and
|
||||
decodes the member trailer, and verifies that the three integrity factors
|
||||
(CRC, data size, and member size) match those calculated by the LZMA decoder.
|
||||
stored there (CRC, data size, and member size) match those computed from the
|
||||
data.
|
||||
|
||||
|
||||
@node Trailing data
|
||||
|
@ -1171,7 +1179,7 @@ clzip -v file
|
|||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 3: Like example 1 but the created @samp{file.lz} is multimember with
|
||||
Example 3: Like example 2 but the created @samp{file.lz} is multimember with
|
||||
a member size of @w{1 MiB}. The compression ratio is not shown.
|
||||
|
||||
@example
|
||||
|
@ -1196,21 +1204,10 @@ show status.
|
|||
clzip -tv file.lz
|
||||
@end example
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 6: Compress a whole device in /dev/sdc and send the output to
|
||||
@samp{file.lz}.
|
||||
|
||||
@example
|
||||
clzip -c /dev/sdc > file.lz
|
||||
or
|
||||
clzip /dev/sdc -o file.lz
|
||||
@end example
|
||||
|
||||
@sp 1
|
||||
@anchor{concat-example}
|
||||
@noindent
|
||||
Example 7: The right way of concatenating the decompressed output of two or
|
||||
Example 6: The right way of concatenating the decompressed output of two or
|
||||
more compressed files. @xref{Trailing data}.
|
||||
|
||||
@example
|
||||
|
@ -1222,7 +1219,7 @@ Do this instead
|
|||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 8: Decompress @samp{file.lz} partially until @w{10 KiB} of
|
||||
Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of
|
||||
decompressed data are produced.
|
||||
|
||||
@example
|
||||
|
@ -1231,13 +1228,24 @@ clzip -cd file.lz | dd bs=1024 count=10
|
|||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 9: Decompress @samp{file.lz} partially from decompressed byte at
|
||||
Example 8: Decompress @samp{file.lz} partially from decompressed byte at
|
||||
offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
|
||||
|
||||
@example
|
||||
clzip -cd file.lz | dd bs=1000 skip=10 count=5
|
||||
@end example
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 9: Compress a whole device in /dev/sdc and send the output to
|
||||
@samp{file.lz}.
|
||||
|
||||
@example
|
||||
clzip -c /dev/sdc > file.lz
|
||||
or
|
||||
clzip /dev/sdc -o file.lz
|
||||
@end example
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
Example 10: Create a multivolume compressed tar archive with a volume size
|
||||
|
@ -1287,7 +1295,7 @@ find by running @w{@samp{clzip --version}}.
|
|||
|
||||
@verbatim
|
||||
/* Lzd - Educational decompressor for the lzip format
|
||||
Copyright (C) 2013-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -1317,7 +1325,7 @@ find by running @w{@samp{clzip --version}}.
|
|||
#include <cstring>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__
|
||||
#include <fcntl.h>
|
||||
#include <io.h>
|
||||
#endif
|
||||
|
@ -1707,7 +1715,7 @@ int main( const int argc, const char * const argv[] )
|
|||
"See the lzip manual for an explanation of the code.\n"
|
||||
"\nUsage: %s [-d] < file.lz > file\n"
|
||||
"Lzd decompresses from standard input to standard output.\n"
|
||||
"\nCopyright (C) 2021 Antonio Diaz Diaz.\n"
|
||||
"\nCopyright (C) 2022 Antonio Diaz Diaz.\n"
|
||||
"License 2-clause BSD.\n"
|
||||
"This is free software: you are free to change and redistribute it.\n"
|
||||
"There is NO WARRANTY, to the extent permitted by law.\n"
|
||||
|
@ -1717,7 +1725,7 @@ int main( const int argc, const char * const argv[] )
|
|||
return 0;
|
||||
}
|
||||
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__
|
||||
setmode( STDIN_FILENO, O_BINARY );
|
||||
setmode( STDOUT_FILENO, O_BINARY );
|
||||
#endif
|
||||
|
|
131
encoder.c
131
encoder.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -33,32 +33,25 @@ CRC32 crc32;
|
|||
|
||||
int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs )
|
||||
{
|
||||
int32_t * ptr0 = e->eb.mb.pos_array + ( e->eb.mb.cyclic_pos << 1 );
|
||||
int32_t * ptr1 = ptr0 + 1;
|
||||
int32_t * newptr;
|
||||
int len = 0, len0 = 0, len1 = 0;
|
||||
int maxlen = 3; /* only used if pairs != 0 */
|
||||
int num_pairs = 0;
|
||||
const int pos1 = e->eb.mb.pos + 1;
|
||||
const int min_pos = ( e->eb.mb.pos > e->eb.mb.dictionary_size ) ?
|
||||
e->eb.mb.pos - e->eb.mb.dictionary_size : 0;
|
||||
const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb );
|
||||
int count, key2, key3, key4, newpos1;
|
||||
unsigned tmp;
|
||||
int len_limit = e->match_len_limit;
|
||||
|
||||
if( len_limit > Mb_available_bytes( &e->eb.mb ) )
|
||||
{
|
||||
len_limit = Mb_available_bytes( &e->eb.mb );
|
||||
if( len_limit < 4 ) return 0;
|
||||
}
|
||||
|
||||
tmp = crc32[data[0]] ^ data[1];
|
||||
key2 = tmp & ( num_prev_positions2 - 1 );
|
||||
int maxlen = 3; /* only used if pairs != 0 */
|
||||
int num_pairs = 0;
|
||||
const int min_pos = ( e->eb.mb.pos > e->eb.mb.dictionary_size ) ?
|
||||
e->eb.mb.pos - e->eb.mb.dictionary_size : 0;
|
||||
const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb );
|
||||
|
||||
unsigned tmp = crc32[data[0]] ^ data[1];
|
||||
const int key2 = tmp & ( num_prev_positions2 - 1 );
|
||||
tmp ^= (unsigned)data[2] << 8;
|
||||
key3 = num_prev_positions2 + ( tmp & ( num_prev_positions3 - 1 ) );
|
||||
key4 = num_prev_positions2 + num_prev_positions3 +
|
||||
( ( tmp ^ ( crc32[data[3]] << 5 ) ) & e->eb.mb.key4_mask );
|
||||
const int key3 = num_prev_positions2 + ( tmp & ( num_prev_positions3 - 1 ) );
|
||||
const int key4 = num_prev_positions2 + num_prev_positions3 +
|
||||
( ( tmp ^ ( crc32[data[3]] << 5 ) ) & e->eb.mb.key4_mask );
|
||||
|
||||
if( pairs )
|
||||
{
|
||||
|
@ -67,7 +60,7 @@ int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs )
|
|||
if( np2 > min_pos && e->eb.mb.buffer[np2-1] == data[0] )
|
||||
{
|
||||
pairs[0].dis = e->eb.mb.pos - np2;
|
||||
pairs[0].len = maxlen = 2;
|
||||
pairs[0].len = maxlen = 2 + ( np2 == np3 );
|
||||
num_pairs = 1;
|
||||
}
|
||||
if( np2 != np3 && np3 > min_pos && e->eb.mb.buffer[np3-1] == data[0] )
|
||||
|
@ -86,18 +79,23 @@ int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs )
|
|||
}
|
||||
}
|
||||
|
||||
const int pos1 = e->eb.mb.pos + 1;
|
||||
e->eb.mb.prev_positions[key2] = pos1;
|
||||
e->eb.mb.prev_positions[key3] = pos1;
|
||||
newpos1 = e->eb.mb.prev_positions[key4];
|
||||
int newpos1 = e->eb.mb.prev_positions[key4];
|
||||
e->eb.mb.prev_positions[key4] = pos1;
|
||||
|
||||
int32_t * ptr0 = e->eb.mb.pos_array + ( e->eb.mb.cyclic_pos << 1 );
|
||||
int32_t * ptr1 = ptr0 + 1;
|
||||
int len = 0, len0 = 0, len1 = 0;
|
||||
|
||||
int count;
|
||||
for( count = e->cycles; ; )
|
||||
{
|
||||
int delta;
|
||||
if( newpos1 <= min_pos || --count < 0 ) { *ptr0 = *ptr1 = 0; break; }
|
||||
|
||||
delta = pos1 - newpos1;
|
||||
newptr = e->eb.mb.pos_array +
|
||||
const int delta = pos1 - newpos1;
|
||||
int32_t * const newptr = e->eb.mb.pos_array +
|
||||
( ( e->eb.mb.cyclic_pos - delta +
|
||||
( (e->eb.mb.cyclic_pos >= delta) ? 0 : e->eb.mb.dictionary_size + 1 ) ) << 1 );
|
||||
if( data[len-delta] == data[len] )
|
||||
|
@ -152,7 +150,6 @@ static void LZe_update_distance_prices( struct LZ_encoder * const e )
|
|||
for( len_state = 0; len_state < len_states; ++len_state )
|
||||
{
|
||||
int * const dsp = e->dis_slot_prices[len_state];
|
||||
int * const dp = e->dis_prices[len_state];
|
||||
const Bit_model * const bmds = e->eb.bm_dis_slot[len_state];
|
||||
int slot = 0;
|
||||
for( ; slot < end_dis_model; ++slot )
|
||||
|
@ -161,6 +158,7 @@ static void LZe_update_distance_prices( struct LZ_encoder * const e )
|
|||
dsp[slot] = price_symbol6( bmds, slot ) +
|
||||
(((( slot >> 1 ) - 1 ) - dis_align_bits ) << price_shift_bits );
|
||||
|
||||
int * const dp = e->dis_prices[len_state];
|
||||
for( dis = 0; dis < start_dis_model; ++dis )
|
||||
dp[dis] = dsp[dis];
|
||||
for( ; dis < modeled_distances; ++dis )
|
||||
|
@ -169,7 +167,7 @@ static void LZe_update_distance_prices( struct LZ_encoder * const e )
|
|||
}
|
||||
|
||||
|
||||
/* Returns the number of bytes advanced (ahead).
|
||||
/* Return the number of bytes advanced (ahead).
|
||||
trials[0]..trials[ahead-1] contain the steps to encode.
|
||||
( trials[0].dis4 == -1 ) means literal.
|
||||
A match/rep longer or equal than match_len_limit finishes the sequence.
|
||||
|
@ -178,9 +176,8 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
const int reps[num_rep_distances],
|
||||
const State state )
|
||||
{
|
||||
int main_len, num_pairs, i, rep, num_trials, len;
|
||||
int rep_index = 0, cur = 0;
|
||||
int replens[num_rep_distances];
|
||||
int num_pairs, num_trials;
|
||||
int i, rep, len;
|
||||
|
||||
if( e->pending_num_pairs > 0 ) /* from previous call */
|
||||
{
|
||||
|
@ -189,8 +186,10 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
}
|
||||
else
|
||||
num_pairs = LZe_read_match_distances( e );
|
||||
main_len = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0;
|
||||
const int main_len = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0;
|
||||
|
||||
int replens[num_rep_distances];
|
||||
int rep_index = 0;
|
||||
for( i = 0; i < num_rep_distances; ++i )
|
||||
{
|
||||
replens[i] = Mb_true_match_len( &e->eb.mb, 0, reps[i] + 1 );
|
||||
|
@ -212,10 +211,7 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
return main_len;
|
||||
}
|
||||
|
||||
{
|
||||
const int pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask;
|
||||
const int match_price = price1( e->eb.bm_match[state][pos_state] );
|
||||
const int rep_match_price = match_price + price1( e->eb.bm_rep[state] );
|
||||
const uint8_t prev_byte = Mb_peek( &e->eb.mb, 1 );
|
||||
const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 );
|
||||
const uint8_t match_byte = Mb_peek( &e->eb.mb, reps[0] + 1 );
|
||||
|
@ -227,6 +223,9 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
e->trials[1].price += LZeb_price_matched( &e->eb, prev_byte, cur_byte, match_byte );
|
||||
e->trials[1].dis4 = -1; /* literal */
|
||||
|
||||
const int match_price = price1( e->eb.bm_match[state][pos_state] );
|
||||
const int rep_match_price = match_price + price1( e->eb.bm_rep[state] );
|
||||
|
||||
if( match_byte == cur_byte )
|
||||
Tr_update( &e->trials[1], rep_match_price +
|
||||
LZeb_price_shortrep( &e->eb, state, pos_state ), 0, 0 );
|
||||
|
@ -250,9 +249,8 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
|
||||
for( rep = 0; rep < num_rep_distances; ++rep )
|
||||
{
|
||||
int price;
|
||||
if( replens[rep] < min_match_len ) continue;
|
||||
price = rep_match_price + LZeb_price_rep( &e->eb, rep, state, pos_state );
|
||||
const int price = rep_match_price + LZeb_price_rep( &e->eb, rep, state, pos_state );
|
||||
for( len = min_match_len; len <= replens[rep]; ++len )
|
||||
Tr_update( &e->trials[len], price +
|
||||
Lp_price( &e->rep_len_prices, len, pos_state ), rep, 0 );
|
||||
|
@ -272,17 +270,10 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
if( ++len > e->pairs[i].len && ++i >= num_pairs ) break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int cur = 0;
|
||||
while( true ) /* price optimization loop */
|
||||
{
|
||||
struct Trial *cur_trial, *next_trial;
|
||||
int newlen, pos_state, triable_bytes, len_limit;
|
||||
int start_len = min_match_len;
|
||||
int next_price, match_price, rep_match_price;
|
||||
State cur_state;
|
||||
uint8_t prev_byte, cur_byte, match_byte;
|
||||
|
||||
Mb_move_pos( &e->eb.mb );
|
||||
if( ++cur >= num_trials ) /* no more initialized trials */
|
||||
{
|
||||
|
@ -290,8 +281,8 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
return cur;
|
||||
}
|
||||
|
||||
num_pairs = LZe_read_match_distances( e );
|
||||
newlen = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0;
|
||||
const int num_pairs = LZe_read_match_distances( e );
|
||||
const int newlen = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0;
|
||||
if( newlen >= e->match_len_limit )
|
||||
{
|
||||
e->pending_num_pairs = num_pairs;
|
||||
|
@ -300,7 +291,8 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
}
|
||||
|
||||
/* give final values to current trial */
|
||||
cur_trial = &e->trials[cur];
|
||||
struct Trial * cur_trial = &e->trials[cur];
|
||||
State cur_state;
|
||||
{
|
||||
const int dis4 = cur_trial->dis4;
|
||||
int prev_index = cur_trial->prev_index;
|
||||
|
@ -331,25 +323,25 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
mtf_reps( dis4, cur_trial->reps ); /* literal is ignored */
|
||||
}
|
||||
|
||||
pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask;
|
||||
prev_byte = Mb_peek( &e->eb.mb, 1 );
|
||||
cur_byte = Mb_peek( &e->eb.mb, 0 );
|
||||
match_byte = Mb_peek( &e->eb.mb, cur_trial->reps[0] + 1 );
|
||||
const int pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask;
|
||||
const uint8_t prev_byte = Mb_peek( &e->eb.mb, 1 );
|
||||
const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 );
|
||||
const uint8_t match_byte = Mb_peek( &e->eb.mb, cur_trial->reps[0] + 1 );
|
||||
|
||||
next_price = cur_trial->price +
|
||||
price0( e->eb.bm_match[cur_state][pos_state] );
|
||||
int next_price = cur_trial->price +
|
||||
price0( e->eb.bm_match[cur_state][pos_state] );
|
||||
if( St_is_char( cur_state ) )
|
||||
next_price += LZeb_price_literal( &e->eb, prev_byte, cur_byte );
|
||||
else
|
||||
next_price += LZeb_price_matched( &e->eb, prev_byte, cur_byte, match_byte );
|
||||
|
||||
/* try last updates to next trial */
|
||||
next_trial = &e->trials[cur+1];
|
||||
struct Trial * next_trial = &e->trials[cur+1];
|
||||
|
||||
Tr_update( next_trial, next_price, -1, cur ); /* literal */
|
||||
|
||||
match_price = cur_trial->price + price1( e->eb.bm_match[cur_state][pos_state] );
|
||||
rep_match_price = match_price + price1( e->eb.bm_rep[cur_state] );
|
||||
const int match_price = cur_trial->price + price1( e->eb.bm_match[cur_state][pos_state] );
|
||||
const int rep_match_price = match_price + price1( e->eb.bm_rep[cur_state] );
|
||||
|
||||
if( match_byte == cur_byte && next_trial->dis4 != 0 &&
|
||||
next_trial->prev_index2 == single_step_trial )
|
||||
|
@ -364,11 +356,11 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
}
|
||||
}
|
||||
|
||||
triable_bytes =
|
||||
const int triable_bytes =
|
||||
min( Mb_available_bytes( &e->eb.mb ), max_num_trials - 1 - cur );
|
||||
if( triable_bytes < min_match_len ) continue;
|
||||
|
||||
len_limit = min( e->match_len_limit, triable_bytes );
|
||||
const int len_limit = min( e->match_len_limit, triable_bytes );
|
||||
|
||||
/* try literal + rep0 */
|
||||
if( match_byte != cur_byte && next_trial->prev_index != cur )
|
||||
|
@ -392,19 +384,20 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
}
|
||||
}
|
||||
|
||||
int start_len = min_match_len;
|
||||
|
||||
/* try rep distances */
|
||||
for( rep = 0; rep < num_rep_distances; ++rep )
|
||||
{
|
||||
const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb );
|
||||
const int dis = cur_trial->reps[rep] + 1;
|
||||
int price;
|
||||
|
||||
if( data[0-dis] != data[0] || data[1-dis] != data[1] ) continue;
|
||||
for( len = min_match_len; len < len_limit; ++len )
|
||||
if( data[len-dis] != data[len] ) break;
|
||||
while( num_trials < cur + len )
|
||||
e->trials[++num_trials].price = infinite_price;
|
||||
price = rep_match_price + LZeb_price_rep( &e->eb, rep, cur_state, pos_state );
|
||||
int price = rep_match_price + LZeb_price_rep( &e->eb, rep, cur_state, pos_state );
|
||||
for( i = min_match_len; i <= len; ++i )
|
||||
Tr_update( &e->trials[cur+i], price +
|
||||
Lp_price( &e->rep_len_prices, i, pos_state ), rep, cur );
|
||||
|
@ -412,17 +405,14 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
if( rep == 0 ) start_len = len + 1; /* discard shorter matches */
|
||||
|
||||
/* try rep + literal + rep0 */
|
||||
{
|
||||
int len2 = len + 1;
|
||||
const int limit = min( e->match_len_limit + len2, triable_bytes );
|
||||
int pos_state2;
|
||||
State state2;
|
||||
while( len2 < limit && data[len2-dis] == data[len2] ) ++len2;
|
||||
len2 -= len + 1;
|
||||
if( len2 < min_match_len ) continue;
|
||||
|
||||
pos_state2 = ( pos_state + len ) & pos_state_mask;
|
||||
state2 = St_set_rep( cur_state );
|
||||
int pos_state2 = ( pos_state + len ) & pos_state_mask;
|
||||
State state2 = St_set_rep( cur_state );
|
||||
price += Lp_price( &e->rep_len_prices, len, pos_state ) +
|
||||
price0( e->eb.bm_match[state2][pos_state2] ) +
|
||||
LZeb_price_matched( &e->eb, data[len-1], data[len], data[len-dis] );
|
||||
|
@ -435,21 +425,19 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
e->trials[++num_trials].price = infinite_price;
|
||||
Tr_update3( &e->trials[cur+len+1+len2], price, rep, cur + len + 1, cur );
|
||||
}
|
||||
}
|
||||
|
||||
/* try matches */
|
||||
if( newlen >= start_len && newlen <= len_limit )
|
||||
{
|
||||
int dis;
|
||||
const int normal_match_price = match_price +
|
||||
price0( e->eb.bm_rep[cur_state] );
|
||||
|
||||
while( num_trials < cur + newlen )
|
||||
e->trials[++num_trials].price = infinite_price;
|
||||
|
||||
i = 0;
|
||||
int i = 0;
|
||||
while( e->pairs[i].len < start_len ) ++i;
|
||||
dis = e->pairs[i].dis;
|
||||
int dis = e->pairs[i].dis;
|
||||
for( len = start_len; ; ++len )
|
||||
{
|
||||
int price = normal_match_price + LZe_price_pair( e, dis, len, pos_state );
|
||||
|
@ -502,7 +490,7 @@ bool LZe_encode_member( struct LZ_encoder * const e,
|
|||
int price_counter = 0; /* counters may decrement below 0 */
|
||||
int dis_price_counter = 0;
|
||||
int align_price_counter = 0;
|
||||
int ahead, i;
|
||||
int i;
|
||||
int reps[num_rep_distances];
|
||||
State state = 0;
|
||||
for( i = 0; i < num_rep_distances; ++i ) reps[i] = 0;
|
||||
|
@ -539,7 +527,7 @@ bool LZe_encode_member( struct LZ_encoder * const e,
|
|||
Lp_update_prices( &e->rep_len_prices );
|
||||
}
|
||||
|
||||
ahead = LZe_sequence_optimizer( e, reps, state );
|
||||
int ahead = LZe_sequence_optimizer( e, reps, state );
|
||||
price_counter -= ahead;
|
||||
|
||||
for( i = 0; ahead > 0; )
|
||||
|
@ -556,14 +544,13 @@ bool LZe_encode_member( struct LZ_encoder * const e,
|
|||
const uint8_t prev_byte = Mb_peek( &e->eb.mb, ahead + 1 );
|
||||
const uint8_t cur_byte = Mb_peek( &e->eb.mb, ahead );
|
||||
CRC32_update_byte( &e->eb.crc, cur_byte );
|
||||
if( St_is_char( state ) )
|
||||
if( ( state = St_set_char( state ) ) < 4 )
|
||||
LZeb_encode_literal( &e->eb, prev_byte, cur_byte );
|
||||
else
|
||||
{
|
||||
const uint8_t match_byte = Mb_peek( &e->eb.mb, ahead + reps[0] + 1 );
|
||||
LZeb_encode_matched( &e->eb, prev_byte, cur_byte, match_byte );
|
||||
}
|
||||
state = St_set_char( state );
|
||||
}
|
||||
else /* match or repeated match */
|
||||
{
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -188,10 +188,9 @@ static inline int LZeb_price_rep( const struct LZ_encoder_base * const eb,
|
|||
const int rep, const State state,
|
||||
const int pos_state )
|
||||
{
|
||||
int price;
|
||||
if( rep == 0 ) return price0( eb->bm_rep0[state] ) +
|
||||
price1( eb->bm_len[state][pos_state] );
|
||||
price = price1( eb->bm_rep0[state] );
|
||||
int price = price1( eb->bm_rep0[state] );
|
||||
if( rep == 1 )
|
||||
price += price0( eb->bm_rep1[state] );
|
||||
else
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -40,8 +40,7 @@ bool Mb_read_block( struct Matchfinder_base * const mb )
|
|||
mb->stream_pos += rd;
|
||||
if( rd != size && errno )
|
||||
{ show_error( "Read error", errno, false ); cleanup_and_fail( 1 ); }
|
||||
if( rd < size )
|
||||
{ mb->at_stream_end = true; mb->pos_limit = mb->buffer_size; }
|
||||
if( rd < size ) { mb->at_stream_end = true; mb->pos_limit = mb->buffer_size; }
|
||||
}
|
||||
return mb->pos < mb->stream_pos;
|
||||
}
|
||||
|
@ -77,7 +76,6 @@ bool Mb_init( struct Matchfinder_base * const mb, const int before_size,
|
|||
{
|
||||
const int buffer_size_limit =
|
||||
( dict_factor * dict_size ) + before_size + after_size;
|
||||
unsigned size;
|
||||
int i;
|
||||
|
||||
mb->partial_data_pos = 0;
|
||||
|
@ -107,9 +105,8 @@ bool Mb_init( struct Matchfinder_base * const mb, const int before_size,
|
|||
mb->dictionary_size = dict_size;
|
||||
mb->pos_limit = mb->buffer_size;
|
||||
if( !mb->at_stream_end ) mb->pos_limit -= after_size;
|
||||
size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 );
|
||||
if( mb->dictionary_size > 1 << 26 ) /* 64 MiB */
|
||||
size >>= 1;
|
||||
unsigned size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 );
|
||||
if( mb->dictionary_size > 1 << 26 ) size >>= 1; /* 64 MiB */
|
||||
mb->key4_mask = size - 1; /* increases with dictionary size */
|
||||
size += num_prev_positions23;
|
||||
mb->num_prev_positions = size;
|
||||
|
@ -138,11 +135,9 @@ void Mb_reset( struct Matchfinder_base * const mb )
|
|||
Mb_read_block( mb );
|
||||
if( mb->at_stream_end && mb->stream_pos < mb->dictionary_size )
|
||||
{
|
||||
int size;
|
||||
mb->dictionary_size = max( min_dictionary_size, mb->stream_pos );
|
||||
size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 );
|
||||
if( mb->dictionary_size > 1 << 26 ) /* 64 MiB */
|
||||
size >>= 1;
|
||||
int size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 );
|
||||
if( mb->dictionary_size > 1 << 26 ) size >>= 1; /* 64 MiB */
|
||||
mb->key4_mask = size - 1;
|
||||
size += mb->num_prev_positions23;
|
||||
mb->num_prev_positions = size;
|
||||
|
@ -169,16 +164,16 @@ void Re_flush_data( struct Range_encoder * const renc )
|
|||
/* End Of Stream marker => (dis == 0xFFFFFFFFU, len == min_match_len) */
|
||||
void LZeb_full_flush( struct LZ_encoder_base * const eb, const State state )
|
||||
{
|
||||
int i;
|
||||
const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask;
|
||||
Lzip_trailer trailer;
|
||||
Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 );
|
||||
Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 );
|
||||
LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len, pos_state );
|
||||
Re_flush( &eb->renc );
|
||||
Lzip_trailer trailer;
|
||||
Lt_set_data_crc( trailer, LZeb_crc( eb ) );
|
||||
Lt_set_data_size( trailer, Mb_data_position( &eb->mb ) );
|
||||
Lt_set_member_size( trailer, Re_member_position( &eb->renc ) + Lt_size );
|
||||
int i;
|
||||
for( i = 0; i < Lt_size; ++i )
|
||||
Re_put_byte( &eb->renc, trailer[i] );
|
||||
Re_flush_data( &eb->renc );
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -83,10 +83,9 @@ static inline int price_bit( const Bit_model bm, const bool bit )
|
|||
|
||||
static inline int price_symbol3( const Bit_model bm[], int symbol )
|
||||
{
|
||||
int price;
|
||||
bool bit = symbol & 1;
|
||||
symbol |= 8; symbol >>= 1;
|
||||
price = price_bit( bm[symbol], bit );
|
||||
int price = price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
return price + price_bit( bm[1], symbol & 1 );
|
||||
}
|
||||
|
@ -94,10 +93,9 @@ static inline int price_symbol3( const Bit_model bm[], int symbol )
|
|||
|
||||
static inline int price_symbol6( const Bit_model bm[], unsigned symbol )
|
||||
{
|
||||
int price;
|
||||
bool bit = symbol & 1;
|
||||
symbol |= 64; symbol >>= 1;
|
||||
price = price_bit( bm[symbol], bit );
|
||||
int price = price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
|
@ -108,10 +106,9 @@ static inline int price_symbol6( const Bit_model bm[], unsigned symbol )
|
|||
|
||||
static inline int price_symbol8( const Bit_model bm[], int symbol )
|
||||
{
|
||||
int price;
|
||||
bool bit = symbol & 1;
|
||||
symbol |= 0x100; symbol >>= 1;
|
||||
price = price_bit( bm[symbol], bit );
|
||||
int price = price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit );
|
||||
|
@ -307,8 +304,7 @@ static inline void Re_encode( struct Range_encoder * const renc,
|
|||
{
|
||||
renc->range >>= 1;
|
||||
if( symbol & mask ) renc->low += renc->range;
|
||||
if( renc->range <= 0x00FFFFFFU )
|
||||
{ renc->range <<= 8; Re_shift_low( renc ); }
|
||||
if( renc->range <= 0x00FFFFFFU ) { renc->range <<= 8; Re_shift_low( renc ); }
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -333,10 +329,9 @@ static inline void Re_encode_bit( struct Range_encoder * const renc,
|
|||
static inline void Re_encode_tree3( struct Range_encoder * const renc,
|
||||
Bit_model bm[], const int symbol )
|
||||
{
|
||||
int model;
|
||||
bool bit = ( symbol >> 2 ) & 1;
|
||||
Re_encode_bit( renc, &bm[1], bit );
|
||||
model = 2 | bit;
|
||||
int model = 2 | bit;
|
||||
bit = ( symbol >> 1 ) & 1;
|
||||
Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit;
|
||||
Re_encode_bit( renc, &bm[model], symbol & 1 );
|
||||
|
@ -345,10 +340,9 @@ static inline void Re_encode_tree3( struct Range_encoder * const renc,
|
|||
static inline void Re_encode_tree6( struct Range_encoder * const renc,
|
||||
Bit_model bm[], const unsigned symbol )
|
||||
{
|
||||
int model;
|
||||
bool bit = ( symbol >> 5 ) & 1;
|
||||
Re_encode_bit( renc, &bm[1], bit );
|
||||
model = 2 | bit;
|
||||
int model = 2 | bit;
|
||||
bit = ( symbol >> 4 ) & 1;
|
||||
Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit;
|
||||
bit = ( symbol >> 3 ) & 1;
|
||||
|
@ -479,8 +473,7 @@ static inline int LZeb_price_matched( const struct LZ_encoder_base * const eb,
|
|||
|
||||
static inline void LZeb_encode_literal( struct LZ_encoder_base * const eb,
|
||||
const uint8_t prev_byte, const uint8_t symbol )
|
||||
{ Re_encode_tree8( &eb->renc, eb->bm_literal[get_lit_state(prev_byte)],
|
||||
symbol ); }
|
||||
{ Re_encode_tree8( &eb->renc, eb->bm_literal[get_lit_state(prev_byte)], symbol ); }
|
||||
|
||||
static inline void LZeb_encode_matched( struct LZ_encoder_base * const eb,
|
||||
const uint8_t prev_byte, const uint8_t symbol, const uint8_t match_byte )
|
||||
|
@ -491,8 +484,8 @@ static inline void LZeb_encode_pair( struct LZ_encoder_base * const eb,
|
|||
const unsigned dis, const int len,
|
||||
const int pos_state )
|
||||
{
|
||||
const unsigned dis_slot = get_slot( dis );
|
||||
Re_encode_len( &eb->renc, &eb->match_len_model, len, pos_state );
|
||||
const unsigned dis_slot = get_slot( dis );
|
||||
Re_encode_tree6( &eb->renc, eb->bm_dis_slot[get_len_state(len)], dis_slot );
|
||||
|
||||
if( dis_slot >= start_dis_model )
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -31,25 +31,24 @@
|
|||
int FLZe_longest_match_len( struct FLZ_encoder * const fe, int * const distance )
|
||||
{
|
||||
enum { len_limit = 16 };
|
||||
const uint8_t * const data = Mb_ptr_to_current_pos( &fe->eb.mb );
|
||||
int32_t * ptr0 = fe->eb.mb.pos_array + fe->eb.mb.cyclic_pos;
|
||||
const int pos1 = fe->eb.mb.pos + 1;
|
||||
int maxlen = 0, newpos1, count;
|
||||
const int available = min( Mb_available_bytes( &fe->eb.mb ), max_match_len );
|
||||
if( available < len_limit ) return 0;
|
||||
|
||||
const uint8_t * const data = Mb_ptr_to_current_pos( &fe->eb.mb );
|
||||
fe->key4 = ( ( fe->key4 << 4 ) ^ data[3] ) & fe->eb.mb.key4_mask;
|
||||
newpos1 = fe->eb.mb.prev_positions[fe->key4];
|
||||
const int pos1 = fe->eb.mb.pos + 1;
|
||||
int newpos1 = fe->eb.mb.prev_positions[fe->key4];
|
||||
fe->eb.mb.prev_positions[fe->key4] = pos1;
|
||||
int32_t * ptr0 = fe->eb.mb.pos_array + fe->eb.mb.cyclic_pos;
|
||||
int maxlen = 0, count;
|
||||
|
||||
for( count = 4; ; )
|
||||
{
|
||||
int32_t * newptr;
|
||||
int delta;
|
||||
if( newpos1 <= 0 || --count < 0 ||
|
||||
( delta = pos1 - newpos1 ) > fe->eb.mb.dictionary_size )
|
||||
{ *ptr0 = 0; break; }
|
||||
newptr = fe->eb.mb.pos_array +
|
||||
int32_t * const newptr = fe->eb.mb.pos_array +
|
||||
( fe->eb.mb.cyclic_pos - delta +
|
||||
( ( fe->eb.mb.cyclic_pos >= delta ) ? 0 : fe->eb.mb.dictionary_size + 1 ) );
|
||||
|
||||
|
@ -118,11 +117,10 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe,
|
|||
Re_encode_bit( &fe->eb.renc, &fe->eb.bm_len[state][pos_state], 1 );
|
||||
else
|
||||
{
|
||||
int distance;
|
||||
Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep1[state], rep > 1 );
|
||||
if( rep > 1 )
|
||||
Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep2[state], rep > 2 );
|
||||
distance = reps[rep];
|
||||
const int distance = reps[rep];
|
||||
for( i = rep; i > 0; --i ) reps[i] = reps[i-1];
|
||||
reps[0] = distance;
|
||||
}
|
||||
|
@ -147,7 +145,6 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe,
|
|||
continue;
|
||||
}
|
||||
|
||||
{
|
||||
const uint8_t prev_byte = Mb_peek( &fe->eb.mb, 1 );
|
||||
const uint8_t cur_byte = Mb_peek( &fe->eb.mb, 0 );
|
||||
const uint8_t match_byte = Mb_peek( &fe->eb.mb, reps[0] + 1 );
|
||||
|
@ -178,12 +175,10 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe,
|
|||
|
||||
/* literal byte */
|
||||
Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[state][pos_state], 0 );
|
||||
if( St_is_char( state ) )
|
||||
if( ( state = St_set_char( state ) ) < 4 )
|
||||
LZeb_encode_literal( &fe->eb, prev_byte, cur_byte );
|
||||
else
|
||||
LZeb_encode_matched( &fe->eb, prev_byte, cur_byte, match_byte );
|
||||
state = St_set_char( state );
|
||||
}
|
||||
}
|
||||
|
||||
LZeb_full_flush( &fe->eb, state );
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
|
69
list.c
69
list.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -52,17 +52,15 @@ int list_files( const char * const filenames[], const int num_filenames,
|
|||
bool stdin_used = false;
|
||||
for( i = 0; i < num_filenames; ++i )
|
||||
{
|
||||
const char * input_filename;
|
||||
struct Lzip_index lzip_index;
|
||||
struct stat in_stats; /* not used */
|
||||
int infd;
|
||||
const bool from_stdin = ( strcmp( filenames[i], "-" ) == 0 );
|
||||
if( from_stdin ) { if( stdin_used ) continue; else stdin_used = true; }
|
||||
input_filename = from_stdin ? "(stdin)" : filenames[i];
|
||||
infd = from_stdin ? STDIN_FILENO :
|
||||
const char * const input_filename = from_stdin ? "(stdin)" : filenames[i];
|
||||
struct stat in_stats; /* not used */
|
||||
const int infd = from_stdin ? STDIN_FILENO :
|
||||
open_instream( input_filename, &in_stats, false, true );
|
||||
if( infd < 0 ) { set_retval( &retval, 1 ); continue; }
|
||||
|
||||
struct Lzip_index lzip_index;
|
||||
Li_init( &lzip_index, infd, ignore_trailing, loose_trailing );
|
||||
close( infd );
|
||||
if( lzip_index.retval != 0 )
|
||||
|
@ -71,37 +69,36 @@ int list_files( const char * const filenames[], const int num_filenames,
|
|||
set_retval( &retval, lzip_index.retval );
|
||||
Li_free( &lzip_index ); continue;
|
||||
}
|
||||
if( verbosity >= 0 )
|
||||
if( verbosity < 0 ) { Li_free( &lzip_index ); continue; }
|
||||
const unsigned long long udata_size = Li_udata_size( &lzip_index );
|
||||
const unsigned long long cdata_size = Li_cdata_size( &lzip_index );
|
||||
total_comp += cdata_size; total_uncomp += udata_size; ++files;
|
||||
const long members = lzip_index.members;
|
||||
if( first_post )
|
||||
{
|
||||
const unsigned long long udata_size = Li_udata_size( &lzip_index );
|
||||
const unsigned long long cdata_size = Li_cdata_size( &lzip_index );
|
||||
total_comp += cdata_size; total_uncomp += udata_size; ++files;
|
||||
if( first_post )
|
||||
{
|
||||
first_post = false;
|
||||
if( verbosity >= 1 ) fputs( " dict memb trail ", stdout );
|
||||
fputs( " uncompressed compressed saved name\n", stdout );
|
||||
}
|
||||
if( verbosity >= 1 )
|
||||
printf( "%s %5ld %6lld ", format_ds( lzip_index.dictionary_size ),
|
||||
lzip_index.members, Li_file_size( &lzip_index ) - cdata_size );
|
||||
list_line( udata_size, cdata_size, input_filename );
|
||||
|
||||
if( verbosity >= 2 && lzip_index.members > 1 )
|
||||
{
|
||||
long i;
|
||||
fputs( " member data_pos data_size member_pos member_size\n", stdout );
|
||||
for( i = 0; i < lzip_index.members; ++i )
|
||||
{
|
||||
const struct Block * db = Li_dblock( &lzip_index, i );
|
||||
const struct Block * mb = Li_mblock( &lzip_index, i );
|
||||
printf( "%6ld %14llu %14llu %14llu %14llu\n",
|
||||
i + 1, db->pos, db->size, mb->pos, mb->size );
|
||||
}
|
||||
first_post = true; /* reprint heading after list of members */
|
||||
}
|
||||
fflush( stdout );
|
||||
first_post = false;
|
||||
if( verbosity >= 1 ) fputs( " dict memb trail ", stdout );
|
||||
fputs( " uncompressed compressed saved name\n", stdout );
|
||||
}
|
||||
if( verbosity >= 1 )
|
||||
printf( "%s %5ld %6lld ", format_ds( lzip_index.dictionary_size ),
|
||||
members, Li_file_size( &lzip_index ) - cdata_size );
|
||||
list_line( udata_size, cdata_size, input_filename );
|
||||
|
||||
if( verbosity >= 2 && members > 1 )
|
||||
{
|
||||
long i;
|
||||
fputs( " member data_pos data_size member_pos member_size\n", stdout );
|
||||
for( i = 0; i < members; ++i )
|
||||
{
|
||||
const struct Block * db = Li_dblock( &lzip_index, i );
|
||||
const struct Block * mb = Li_mblock( &lzip_index, i );
|
||||
printf( "%6ld %14llu %14llu %14llu %14llu\n",
|
||||
i + 1, db->pos, db->size, mb->pos, mb->size );
|
||||
}
|
||||
first_post = true; /* reprint heading after list of members */
|
||||
}
|
||||
fflush( stdout );
|
||||
Li_free( &lzip_index );
|
||||
}
|
||||
if( verbosity >= 0 && files > 1 )
|
||||
|
|
9
lzip.h
9
lzip.h
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -136,6 +136,7 @@ static inline void CRC32_init( void )
|
|||
static inline void CRC32_update_byte( uint32_t * const crc, const uint8_t byte )
|
||||
{ *crc = crc32[(*crc^byte)&0xFF] ^ ( *crc >> 8 ); }
|
||||
|
||||
/* about as fast as it is possible without messing with endianness */
|
||||
static inline void CRC32_update_buf( uint32_t * const crc,
|
||||
const uint8_t * const buffer,
|
||||
const int size )
|
||||
|
@ -269,12 +270,12 @@ static inline bool Lt_verify_consistency( const Lzip_trailer data )
|
|||
{
|
||||
const unsigned crc = Lt_get_data_crc( data );
|
||||
const unsigned long long dsize = Lt_get_data_size( data );
|
||||
const unsigned long long msize = Lt_get_member_size( data );
|
||||
const unsigned long long mlimit = ( 9 * dsize + 7 ) / 8 + min_member_size;
|
||||
const unsigned long long dlimit = 7090 * ( msize - 26 ) - 1;
|
||||
if( ( crc == 0 ) != ( dsize == 0 ) ) return false;
|
||||
const unsigned long long msize = Lt_get_member_size( data );
|
||||
if( msize < min_member_size ) return false;
|
||||
const unsigned long long mlimit = ( 9 * dsize + 7 ) / 8 + min_member_size;
|
||||
if( mlimit > dsize && msize > mlimit ) return false;
|
||||
const unsigned long long dlimit = 7090 * ( msize - 26 ) - 1;
|
||||
if( dlimit > msize && dsize > dlimit ) return false;
|
||||
return true;
|
||||
}
|
||||
|
|
43
lzip_index.c
43
lzip_index.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -133,44 +133,38 @@ static bool Li_skip_trailing_data( struct Lzip_index * const li, const int fd,
|
|||
const bool ignore_trailing,
|
||||
const bool loose_trailing )
|
||||
{
|
||||
if( *pos < min_member_size ) return false;
|
||||
enum { block_size = 16384,
|
||||
buffer_size = block_size + Lt_size - 1 + Lh_size };
|
||||
uint8_t buffer[buffer_size];
|
||||
int bsize = *pos % block_size; /* total bytes in buffer */
|
||||
int search_size, rd_size;
|
||||
unsigned long long ipos;
|
||||
int i;
|
||||
if( *pos < min_member_size ) return false;
|
||||
if( bsize <= buffer_size - block_size ) bsize += block_size;
|
||||
search_size = bsize; /* bytes to search for trailer */
|
||||
rd_size = bsize; /* bytes to read from file */
|
||||
ipos = *pos - rd_size; /* aligned to block_size */
|
||||
int search_size = bsize; /* bytes to search for trailer */
|
||||
int rd_size = bsize; /* bytes to read from file */
|
||||
unsigned long long ipos = *pos - rd_size; /* aligned to block_size */
|
||||
|
||||
while( true )
|
||||
{
|
||||
const uint8_t max_msb = ( ipos + search_size ) >> 56;
|
||||
if( seek_read( fd, buffer, rd_size, ipos ) != rd_size )
|
||||
{ Li_set_errno_error( li, "Error seeking member trailer: " );
|
||||
return false; }
|
||||
{ Li_set_errno_error( li, "Error seeking member trailer: " ); return false; }
|
||||
const uint8_t max_msb = ( ipos + search_size ) >> 56;
|
||||
int i;
|
||||
for( i = search_size; i >= Lt_size; --i )
|
||||
if( buffer[i-1] <= max_msb ) /* most significant byte of member_size */
|
||||
{
|
||||
Lzip_header header;
|
||||
const Lzip_header * header2;
|
||||
const Lzip_trailer * const trailer =
|
||||
(const Lzip_trailer *)( buffer + i - Lt_size );
|
||||
const unsigned long long member_size = Lt_get_member_size( *trailer );
|
||||
unsigned dictionary_size;
|
||||
bool full_h2;
|
||||
if( member_size == 0 ) /* skip trailing zeros */
|
||||
{ while( i > Lt_size && buffer[i-9] == 0 ) --i; continue; }
|
||||
if( member_size > ipos + i || !Lt_verify_consistency( *trailer ) )
|
||||
continue;
|
||||
Lzip_header header;
|
||||
if( !Li_read_header( li, fd, header, ipos + i - member_size ) )
|
||||
return false;
|
||||
if( !Lh_verify( header ) ) continue;
|
||||
header2 = (const Lzip_header *)( buffer + i );
|
||||
full_h2 = bsize - i >= Lh_size;
|
||||
const Lzip_header * header2 = (const Lzip_header *)( buffer + i );
|
||||
const bool full_h2 = bsize - i >= Lh_size;
|
||||
if( Lh_verify_prefix( *header2, bsize - i ) ) /* last member */
|
||||
{
|
||||
if( !full_h2 ) add_error( li, "Last member in input file is truncated." );
|
||||
|
@ -183,7 +177,7 @@ static bool Li_skip_trailing_data( struct Lzip_index * const li, const int fd,
|
|||
if( !ignore_trailing )
|
||||
{ add_error( li, trailing_msg ); li->retval = 2; return false; }
|
||||
*pos = ipos + i - member_size;
|
||||
dictionary_size = Lh_get_dictionary_size( header );
|
||||
const unsigned dictionary_size = Lh_get_dictionary_size( header );
|
||||
if( li->dictionary_size < dictionary_size )
|
||||
li->dictionary_size = dictionary_size;
|
||||
return push_back_member( li, 0, Lt_get_data_size( *trailer ), *pos,
|
||||
|
@ -204,9 +198,6 @@ static bool Li_skip_trailing_data( struct Lzip_index * const li, const int fd,
|
|||
bool Li_init( struct Lzip_index * const li, const int infd,
|
||||
const bool ignore_trailing, const bool loose_trailing )
|
||||
{
|
||||
Lzip_header header;
|
||||
unsigned long long pos;
|
||||
long i;
|
||||
li->member_vector = 0;
|
||||
li->error = 0;
|
||||
li->insize = lseek( infd, 0, SEEK_END );
|
||||
|
@ -223,18 +214,17 @@ bool Li_init( struct Lzip_index * const li, const int infd,
|
|||
{ add_error( li, "Input file is too long (2^63 bytes or more)." );
|
||||
li->retval = 2; return false; }
|
||||
|
||||
Lzip_header header;
|
||||
if( !Li_read_header( li, infd, header, 0 ) ) return false;
|
||||
if( Li_check_header_error( li, header ) ) return false;
|
||||
|
||||
pos = li->insize; /* always points to a header or to EOF */
|
||||
unsigned long long pos = li->insize; /* always points to a header or to EOF */
|
||||
while( pos >= min_member_size )
|
||||
{
|
||||
Lzip_trailer trailer;
|
||||
unsigned long long member_size;
|
||||
unsigned dictionary_size;
|
||||
if( seek_read( infd, trailer, Lt_size, pos - Lt_size ) != Lt_size )
|
||||
{ Li_set_errno_error( li, "Error reading member trailer: " ); break; }
|
||||
member_size = Lt_get_member_size( trailer );
|
||||
const unsigned long long member_size = Lt_get_member_size( trailer );
|
||||
if( member_size > pos || !Lt_verify_consistency( trailer ) )
|
||||
{ /* bad trailer */
|
||||
if( li->members <= 0 )
|
||||
|
@ -253,7 +243,7 @@ bool Li_init( struct Lzip_index * const li, const int infd,
|
|||
break;
|
||||
}
|
||||
pos -= member_size;
|
||||
dictionary_size = Lh_get_dictionary_size( header );
|
||||
const unsigned dictionary_size = Lh_get_dictionary_size( header );
|
||||
if( li->dictionary_size < dictionary_size )
|
||||
li->dictionary_size = dictionary_size;
|
||||
if( !push_back_member( li, 0, Lt_get_data_size( trailer ), pos,
|
||||
|
@ -268,6 +258,7 @@ bool Li_init( struct Lzip_index * const li, const int infd,
|
|||
return false;
|
||||
}
|
||||
Li_reverse_member_vector( li );
|
||||
long i;
|
||||
for( i = 0; ; ++i )
|
||||
{
|
||||
const long long end = block_end( li->member_vector[i].dblock );
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
|
173
main.c
173
main.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -18,7 +18,7 @@
|
|||
Exit status: 0 for a normal exit, 1 for environmental problems
|
||||
(file not found, invalid flags, I/O errors, etc), 2 to indicate a
|
||||
corrupt or invalid input file, 3 for an internal consistency error
|
||||
(eg, bug) which caused clzip to panic.
|
||||
(e.g., bug) which caused clzip to panic.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
@ -36,9 +36,9 @@
|
|||
#include <unistd.h>
|
||||
#include <utime.h>
|
||||
#include <sys/stat.h>
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__
|
||||
#include <io.h>
|
||||
#if defined(__MSVCRT__)
|
||||
#if defined __MSVCRT__
|
||||
#define fchmod(x,y) 0
|
||||
#define fchown(x,y,z) 0
|
||||
#define strtoull strtoul
|
||||
|
@ -51,7 +51,7 @@
|
|||
#define S_IWOTH 0
|
||||
#endif
|
||||
#endif
|
||||
#if defined(__DJGPP__)
|
||||
#if defined __DJGPP__
|
||||
#define S_ISSOCK(x) 0
|
||||
#define S_ISVTX 0
|
||||
#endif
|
||||
|
@ -72,10 +72,15 @@
|
|||
#error "Environments where CHAR_BIT != 8 are not supported."
|
||||
#endif
|
||||
|
||||
#if ( defined SIZE_MAX && SIZE_MAX < UINT_MAX ) || \
|
||||
( defined SSIZE_MAX && SSIZE_MAX < INT_MAX )
|
||||
#error "Environments where 'size_t' is narrower than 'int' are not supported."
|
||||
#endif
|
||||
|
||||
int verbosity = 0;
|
||||
|
||||
static const char * const program_name = "clzip";
|
||||
static const char * const program_year = "2021";
|
||||
static const char * const program_year = "2022";
|
||||
static const char * invocation_name = "clzip"; /* default value */
|
||||
|
||||
static const struct { const char * from; const char * to; } known_extensions[] = {
|
||||
|
@ -106,13 +111,14 @@ static void show_help( void )
|
|||
"C++ compiler.\n"
|
||||
"\nLzip is a lossless data compressor with a user interface similar to the one\n"
|
||||
"of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov\n"
|
||||
"chain-Algorithm' (LZMA) stream format, chosen to maximize safety and\n"
|
||||
"interoperability. Lzip can compress about as fast as gzip (lzip -0) or\n"
|
||||
"compress most files more than bzip2 (lzip -9). Decompression speed is\n"
|
||||
"intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from\n"
|
||||
"a data recovery perspective. Lzip has been designed, written, and tested\n"
|
||||
"with great care to replace gzip and bzip2 as the standard general-purpose\n"
|
||||
"compressed format for unix-like systems.\n"
|
||||
"chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity\n"
|
||||
"checking to maximize interoperability and optimize safety. Lzip can compress\n"
|
||||
"about as fast as gzip (lzip -0) or compress most files more than bzip2\n"
|
||||
"(lzip -9). Decompression speed is intermediate between gzip and bzip2.\n"
|
||||
"Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip\n"
|
||||
"has been designed, written, and tested with great care to replace gzip and\n"
|
||||
"bzip2 as the standard general-purpose compressed format for unix-like\n"
|
||||
"systems.\n"
|
||||
"\nUsage: %s [options] [files]\n", invocation_name );
|
||||
printf( "\nOptions:\n"
|
||||
" -h, --help display this help and exit\n"
|
||||
|
@ -150,7 +156,7 @@ static void show_help( void )
|
|||
"'tar -xf foo.tar.lz' or 'clzip -cd foo.tar.lz | tar -xf -'.\n"
|
||||
"\nExit status: 0 for a normal exit, 1 for environmental problems (file\n"
|
||||
"not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or\n"
|
||||
"invalid input file, 3 for an internal consistency error (eg, bug) which\n"
|
||||
"invalid input file, 3 for an internal consistency error (e.g., bug) which\n"
|
||||
"caused clzip to panic.\n"
|
||||
"\nThe ideas embodied in clzip are due to (at least) the following people:\n"
|
||||
"Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the\n"
|
||||
|
@ -194,8 +200,6 @@ struct Pretty_print
|
|||
static void Pp_init( struct Pretty_print * const pp,
|
||||
const char * const filenames[], const int num_filenames )
|
||||
{
|
||||
unsigned stdin_name_len;
|
||||
int i;
|
||||
pp->name = 0;
|
||||
pp->padded_name = 0;
|
||||
pp->stdin_name = "(stdin)";
|
||||
|
@ -203,7 +207,8 @@ static void Pp_init( struct Pretty_print * const pp,
|
|||
pp->first_post = false;
|
||||
|
||||
if( verbosity <= 0 ) return;
|
||||
stdin_name_len = strlen( pp->stdin_name );
|
||||
const unsigned stdin_name_len = strlen( pp->stdin_name );
|
||||
int i;
|
||||
for( i = 0; i < num_filenames; ++i )
|
||||
{
|
||||
const char * const s = filenames[i];
|
||||
|
@ -237,16 +242,14 @@ static void Pp_reset( struct Pretty_print * const pp )
|
|||
|
||||
void Pp_show_msg( struct Pretty_print * const pp, const char * const msg )
|
||||
{
|
||||
if( verbosity >= 0 )
|
||||
if( verbosity < 0 ) return;
|
||||
if( pp->first_post )
|
||||
{
|
||||
if( pp->first_post )
|
||||
{
|
||||
pp->first_post = false;
|
||||
fputs( pp->padded_name, stderr );
|
||||
if( !msg ) fflush( stderr );
|
||||
}
|
||||
if( msg ) fprintf( stderr, "%s\n", msg );
|
||||
pp->first_post = false;
|
||||
fputs( pp->padded_name, stderr );
|
||||
if( !msg ) fflush( stderr );
|
||||
}
|
||||
if( msg ) fprintf( stderr, "%s\n", msg );
|
||||
}
|
||||
|
||||
|
||||
|
@ -284,17 +287,53 @@ void show_header( const unsigned dictionary_size )
|
|||
}
|
||||
|
||||
|
||||
static unsigned long long getnum( const char * const ptr,
|
||||
/* separate large numbers >= 100_000 in groups of 3 digits using '_' */
|
||||
static const char * format_num3( unsigned long long num )
|
||||
{
|
||||
const char * const si_prefix = "kMGTPEZY";
|
||||
const char * const binary_prefix = "KMGTPEZY";
|
||||
enum { buffers = 8, bufsize = 4 * sizeof (long long) };
|
||||
static char buffer[buffers][bufsize]; /* circle of static buffers for printf */
|
||||
static int current = 0;
|
||||
int i;
|
||||
char * const buf = buffer[current++]; current %= buffers;
|
||||
char * p = buf + bufsize - 1; /* fill the buffer backwards */
|
||||
*p = 0; /* terminator */
|
||||
if( num > 1024 )
|
||||
{
|
||||
char prefix = 0; /* try binary first, then si */
|
||||
for( i = 0; i < 8 && num >= 1024 && num % 1024 == 0; ++i )
|
||||
{ num /= 1024; prefix = binary_prefix[i]; }
|
||||
if( prefix ) *(--p) = 'i';
|
||||
else
|
||||
for( i = 0; i < 8 && num >= 1000 && num % 1000 == 0; ++i )
|
||||
{ num /= 1000; prefix = si_prefix[i]; }
|
||||
if( prefix ) *(--p) = prefix;
|
||||
}
|
||||
const bool split = num >= 100000;
|
||||
|
||||
for( i = 0; ; )
|
||||
{
|
||||
*(--p) = num % 10 + '0'; num /= 10; if( num == 0 ) break;
|
||||
if( split && ++i >= 3 ) { i = 0; *(--p) = '_'; }
|
||||
}
|
||||
return p;
|
||||
}
|
||||
|
||||
|
||||
static unsigned long long getnum( const char * const arg,
|
||||
const char * const option_name,
|
||||
const unsigned long long llimit,
|
||||
const unsigned long long ulimit )
|
||||
{
|
||||
unsigned long long result;
|
||||
char * tail;
|
||||
errno = 0;
|
||||
result = strtoull( ptr, &tail, 0 );
|
||||
if( tail == ptr )
|
||||
unsigned long long result = strtoull( arg, &tail, 0 );
|
||||
if( tail == arg )
|
||||
{
|
||||
show_error( "Bad or missing numerical argument.", 0, true );
|
||||
if( verbosity >= 0 )
|
||||
fprintf( stderr, "%s: Bad or missing numerical argument in "
|
||||
"option '%s'.\n", program_name, option_name );
|
||||
exit( 1 );
|
||||
}
|
||||
|
||||
|
@ -317,7 +356,9 @@ static unsigned long long getnum( const char * const ptr,
|
|||
}
|
||||
if( exponent <= 0 )
|
||||
{
|
||||
show_error( "Bad multiplier in numerical argument.", 0, true );
|
||||
if( verbosity >= 0 )
|
||||
fprintf( stderr, "%s: Bad multiplier in numerical argument of "
|
||||
"option '%s'.\n", program_name, option_name );
|
||||
exit( 1 );
|
||||
}
|
||||
for( i = 0; i < exponent; ++i )
|
||||
|
@ -329,21 +370,24 @@ static unsigned long long getnum( const char * const ptr,
|
|||
if( !errno && ( result < llimit || result > ulimit ) ) errno = ERANGE;
|
||||
if( errno )
|
||||
{
|
||||
show_error( "Numerical argument out of limits.", 0, false );
|
||||
if( verbosity >= 0 )
|
||||
fprintf( stderr, "%s: Numerical argument out of limits [%s,%s] "
|
||||
"in option '%s'.\n", program_name, format_num3( llimit ),
|
||||
format_num3( ulimit ), option_name );
|
||||
exit( 1 );
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
static int get_dict_size( const char * const arg )
|
||||
static int get_dict_size( const char * const arg, const char * const option_name )
|
||||
{
|
||||
char * tail;
|
||||
const long bits = strtol( arg, &tail, 0 );
|
||||
if( bits >= min_dictionary_bits &&
|
||||
bits <= max_dictionary_bits && *tail == 0 )
|
||||
return 1 << bits;
|
||||
return getnum( arg, min_dictionary_size, max_dictionary_size );
|
||||
return getnum( arg, option_name, min_dictionary_size, max_dictionary_size );
|
||||
}
|
||||
|
||||
|
||||
|
@ -519,7 +563,7 @@ static bool check_tty_in( const char * const input_filename, const int infd,
|
|||
isatty( infd ) ) /* for example /dev/tty */
|
||||
{ show_file_error( input_filename,
|
||||
"I won't read compressed data from a terminal.", 0 );
|
||||
close( infd ); set_retval( retval, 1 );
|
||||
close( infd ); set_retval( retval, 2 );
|
||||
if( program_mode != m_test ) cleanup_and_fail( *retval );
|
||||
return false; }
|
||||
return true;
|
||||
|
@ -600,7 +644,6 @@ static int compress( const unsigned long long cfile_size,
|
|||
struct Pretty_print * const pp,
|
||||
const struct stat * const in_statsp, const bool zero )
|
||||
{
|
||||
unsigned long long in_size = 0, out_size = 0, partial_volume_size = 0;
|
||||
int retval = 0;
|
||||
struct Poly_encoder encoder = { 0, 0, 0 }; /* polymorphic encoder */
|
||||
if( verbosity >= 1 ) Pp_show_msg( pp, 0 );
|
||||
|
@ -633,6 +676,7 @@ static int compress( const unsigned long long cfile_size,
|
|||
}
|
||||
}
|
||||
|
||||
unsigned long long in_size = 0, out_size = 0, partial_volume_size = 0;
|
||||
while( true ) /* encode one member per iteration */
|
||||
{
|
||||
const unsigned long long size = ( volume_size > 0 ) ?
|
||||
|
@ -729,12 +773,9 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
|
||||
for( first_member = true; ; first_member = false )
|
||||
{
|
||||
int result, size;
|
||||
unsigned dictionary_size;
|
||||
Lzip_header header;
|
||||
struct LZ_decoder decoder;
|
||||
Rd_reset_member_position( &rdec );
|
||||
size = Rd_read_data( &rdec, header, Lh_size );
|
||||
const int size = Rd_read_data( &rdec, header, Lh_size );
|
||||
if( Rd_finished( &rdec ) ) /* End Of File */
|
||||
{
|
||||
if( first_member )
|
||||
|
@ -764,17 +805,18 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
if( !Lh_verify_version( header ) )
|
||||
{ Pp_show_msg( pp, bad_version( Lh_version( header ) ) );
|
||||
retval = 2; break; }
|
||||
dictionary_size = Lh_get_dictionary_size( header );
|
||||
const unsigned dictionary_size = Lh_get_dictionary_size( header );
|
||||
if( !isvalid_ds( dictionary_size ) )
|
||||
{ Pp_show_msg( pp, bad_dict_msg ); retval = 2; break; }
|
||||
|
||||
if( verbosity >= 2 || ( verbosity == 1 && first_member ) )
|
||||
Pp_show_msg( pp, 0 );
|
||||
|
||||
struct LZ_decoder decoder;
|
||||
if( !LZd_init( &decoder, &rdec, dictionary_size, outfd ) )
|
||||
{ Pp_show_msg( pp, mem_msg ); retval = 1; break; }
|
||||
show_dprogress( cfile_size, partial_file_pos, &rdec, pp ); /* init */
|
||||
result = LZd_decode_member( &decoder, pp );
|
||||
const int result = LZd_decode_member( &decoder, pp );
|
||||
partial_file_pos += Rd_member_position( &rdec );
|
||||
LZd_free( &decoder );
|
||||
if( result != 0 )
|
||||
|
@ -911,24 +953,16 @@ int main( const int argc, const char * const argv[] )
|
|||
unsigned long long member_size = max_member_size;
|
||||
unsigned long long volume_size = 0;
|
||||
const char * default_output_filename = "";
|
||||
static struct Arg_parser parser; /* static because valgrind complains */
|
||||
static struct Pretty_print pp; /* and memory management in C sucks */
|
||||
static const char ** filenames = 0;
|
||||
int num_filenames = 0;
|
||||
enum Mode program_mode = m_compress;
|
||||
int argind = 0;
|
||||
int failed_tests = 0;
|
||||
int retval = 0;
|
||||
int i;
|
||||
bool filenames_given = false;
|
||||
bool force = false;
|
||||
bool ignore_trailing = true;
|
||||
bool keep_input_files = false;
|
||||
bool loose_trailing = false;
|
||||
bool recompress = false;
|
||||
bool stdin_used = false;
|
||||
bool to_stdout = false;
|
||||
bool zero = false;
|
||||
if( argc > 0 ) invocation_name = argv[0];
|
||||
|
||||
enum { opt_lt = 256 };
|
||||
const struct ap_Option options[] =
|
||||
|
@ -964,19 +998,22 @@ int main( const int argc, const char * const argv[] )
|
|||
{ opt_lt, "loose-trailing", ap_no },
|
||||
{ 0, 0, ap_no } };
|
||||
|
||||
if( argc > 0 ) invocation_name = argv[0];
|
||||
CRC32_init();
|
||||
|
||||
/* static because valgrind complains and memory management in C sucks */
|
||||
static struct Arg_parser parser;
|
||||
if( !ap_init( &parser, argc, argv, options, 0 ) )
|
||||
{ show_error( mem_msg, 0, false ); return 1; }
|
||||
if( ap_error( &parser ) ) /* bad option */
|
||||
{ show_error( ap_error( &parser ), 0, true ); return 1; }
|
||||
|
||||
int argind = 0;
|
||||
for( ; argind < ap_arguments( &parser ); ++argind )
|
||||
{
|
||||
const int code = ap_code( &parser, argind );
|
||||
const char * const arg = ap_argument( &parser, argind );
|
||||
if( !code ) break; /* no more options */
|
||||
const char * const pn = ap_parsed_name( &parser, argind );
|
||||
const char * const arg = ap_argument( &parser, argind );
|
||||
switch( code )
|
||||
{
|
||||
case '0': case '1': case '2': case '3': case '4':
|
||||
|
@ -984,7 +1021,7 @@ int main( const int argc, const char * const argv[] )
|
|||
zero = ( code == '0' );
|
||||
encoder_options = option_mapping[code-'0']; break;
|
||||
case 'a': ignore_trailing = false; break;
|
||||
case 'b': member_size = getnum( arg, 100000, max_member_size ); break;
|
||||
case 'b': member_size = getnum( arg, pn, 100000, max_member_size ); break;
|
||||
case 'c': to_stdout = true; break;
|
||||
case 'd': set_mode( &program_mode, m_decompress ); break;
|
||||
case 'f': force = true; break;
|
||||
|
@ -993,15 +1030,15 @@ int main( const int argc, const char * const argv[] )
|
|||
case 'k': keep_input_files = true; break;
|
||||
case 'l': set_mode( &program_mode, m_list ); break;
|
||||
case 'm': encoder_options.match_len_limit =
|
||||
getnum( arg, min_match_len_limit, max_match_len );
|
||||
getnum( arg, pn, min_match_len_limit, max_match_len );
|
||||
zero = false; break;
|
||||
case 'n': break;
|
||||
case 'o': if( strcmp( arg, "-" ) == 0 ) to_stdout = true;
|
||||
else { default_output_filename = arg; } break;
|
||||
case 'q': verbosity = -1; break;
|
||||
case 's': encoder_options.dictionary_size = get_dict_size( arg );
|
||||
case 's': encoder_options.dictionary_size = get_dict_size( arg, pn );
|
||||
zero = false; break;
|
||||
case 'S': volume_size = getnum( arg, 100000, max_volume_size ); break;
|
||||
case 'S': volume_size = getnum( arg, pn, 100000, max_volume_size ); break;
|
||||
case 't': set_mode( &program_mode, m_test ); break;
|
||||
case 'v': if( verbosity < 4 ) ++verbosity; break;
|
||||
case 'V': show_version(); return 0;
|
||||
|
@ -1010,15 +1047,17 @@ int main( const int argc, const char * const argv[] )
|
|||
}
|
||||
} /* end process options */
|
||||
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__
|
||||
setmode( STDIN_FILENO, O_BINARY );
|
||||
setmode( STDOUT_FILENO, O_BINARY );
|
||||
#endif
|
||||
|
||||
num_filenames = max( 1, ap_arguments( &parser ) - argind );
|
||||
static const char ** filenames = 0;
|
||||
int num_filenames = max( 1, ap_arguments( &parser ) - argind );
|
||||
filenames = resize_buffer( filenames, num_filenames * sizeof filenames[0] );
|
||||
filenames[0] = "-";
|
||||
|
||||
bool filenames_given = false;
|
||||
for( i = 0; argind + i < ap_arguments( &parser ); ++i )
|
||||
{
|
||||
filenames[i] = ap_argument( &parser, argind + i );
|
||||
|
@ -1052,17 +1091,18 @@ int main( const int argc, const char * const argv[] )
|
|||
if( !to_stdout && program_mode != m_test && ( filenames_given || to_file ) )
|
||||
set_signals( signal_handler );
|
||||
|
||||
static struct Pretty_print pp;
|
||||
Pp_init( &pp, filenames, num_filenames );
|
||||
|
||||
int failed_tests = 0;
|
||||
int retval = 0;
|
||||
const bool one_to_one = !to_stdout && program_mode != m_test && !to_file;
|
||||
bool stdin_used = false;
|
||||
for( i = 0; i < num_filenames; ++i )
|
||||
{
|
||||
unsigned long long cfile_size;
|
||||
const char * input_filename = "";
|
||||
int infd;
|
||||
int tmp;
|
||||
struct stat in_stats;
|
||||
const struct stat * in_statsp;
|
||||
|
||||
Pp_set_name( &pp, filenames[i] );
|
||||
if( strcmp( filenames[i], "-" ) == 0 )
|
||||
|
@ -1104,9 +1144,12 @@ int main( const int argc, const char * const argv[] )
|
|||
return 1; /* check tty only once and don't try to delete a tty */
|
||||
}
|
||||
|
||||
in_statsp = ( input_filename[0] && one_to_one ) ? &in_stats : 0;
|
||||
cfile_size = ( input_filename[0] && S_ISREG( in_stats.st_mode ) ) ?
|
||||
( in_stats.st_size + 99 ) / 100 : 0;
|
||||
const struct stat * const in_statsp =
|
||||
( input_filename[0] && one_to_one ) ? &in_stats : 0;
|
||||
const unsigned long long cfile_size =
|
||||
( input_filename[0] && S_ISREG( in_stats.st_mode ) ) ?
|
||||
( in_stats.st_size + 99 ) / 100 : 0;
|
||||
int tmp;
|
||||
if( program_mode == m_compress )
|
||||
tmp = compress( cfile_size, member_size, volume_size, infd,
|
||||
&encoder_options, &pp, in_statsp, zero );
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
#! /bin/sh
|
||||
# check script for Clzip - LZMA lossless data compressor
|
||||
# Copyright (C) 2010-2021 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2010-2022 Antonio Diaz Diaz.
|
||||
#
|
||||
# This script is free software: you have unlimited permission
|
||||
# to copy, distribute, and modify it.
|
||||
|
@ -100,6 +100,7 @@ done
|
|||
printf "LZIP\001-.............................." | "${LZIP}" -t 2> /dev/null
|
||||
printf "LZIP\002-.............................." | "${LZIP}" -t 2> /dev/null
|
||||
printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null
|
||||
rm -f out || framework_failure
|
||||
|
||||
printf "\ntesting decompression..."
|
||||
|
||||
|
@ -123,17 +124,22 @@ lines=$("${LZIP}" -tvv "${in_em}" 2>&1 | wc -l) || test_failed $LINENO
|
|||
lines=$("${LZIP}" -lvv "${in_em}" | wc -l) || test_failed $LINENO
|
||||
[ "${lines}" -eq 11 ] || test_failed $LINENO "${lines}"
|
||||
|
||||
"${LZIP}" -cd "${fox_lz}" > fox || test_failed $LINENO
|
||||
cat "${in_lz}" > copy.lz || framework_failure
|
||||
"${LZIP}" -dk copy.lz || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
printf "to be overwritten" > copy || framework_failure
|
||||
"${LZIP}" -d copy.lz 2> /dev/null
|
||||
cat fox > copy || framework_failure
|
||||
cat "${in_lz}" > out.lz || framework_failure
|
||||
rm -f out || framework_failure
|
||||
"${LZIP}" -d copy.lz out.lz 2> /dev/null # skip copy, decompress out
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
cmp fox copy || test_failed $LINENO
|
||||
cmp in out || test_failed $LINENO
|
||||
"${LZIP}" -df copy.lz || test_failed $LINENO
|
||||
[ ! -e copy.lz ] || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
rm -f copy out || framework_failure
|
||||
|
||||
rm -f copy || framework_failure
|
||||
cat "${in_lz}" > copy.lz || framework_failure
|
||||
"${LZIP}" -d -S100k copy.lz || test_failed $LINENO # ignore -S
|
||||
[ ! -e copy.lz ] || test_failed $LINENO
|
||||
|
@ -167,7 +173,7 @@ rm -f copy anyothername.out || framework_failure
|
|||
[ $? = 1 ] || test_failed $LINENO
|
||||
"${LZIP}" -cdq in "${in_lz}" > copy
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
cat copy in | cmp in - || test_failed $LINENO
|
||||
cat copy in | cmp in - || test_failed $LINENO # copy must be empty
|
||||
"${LZIP}" -cdq nx_file.lz "${in_lz}" > copy
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
|
@ -375,7 +381,6 @@ for i in fox_v2.lz fox_s11.lz fox_de20.lz \
|
|||
[ $? = 2 ] || test_failed $LINENO $i
|
||||
done
|
||||
|
||||
"${LZIP}" -cd "${fox_lz}" > fox || test_failed $LINENO
|
||||
for i in fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do
|
||||
"${LZIP}" -cdq "${testdir}"/$i > out
|
||||
[ $? = 2 ] || test_failed $LINENO $i
|
||||
|
|
Loading…
Add table
Reference in a new issue