1
0
Fork 0

Merging upstream version 1.11.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 20:46:36 +01:00
parent c1d97756f3
commit d865a97d34
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
26 changed files with 1012 additions and 896 deletions

View file

@ -1,7 +1,19 @@
2019-01-03 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.11 released.
* File_* renamed to Lzip_*.
* lzip.h (Lzip_trailer): New function 'Lt_verify_consistency'.
* lzip_index.c: Detect some kinds of corrupt trailers.
* main.c (main): Check return value of close( infd ).
* main.c: Compile on DOS with DJGPP.
* clzip.texi: Improved descriptions of '-0..-9', '-m' and '-s'.
* configure: Accept appending to CFLAGS, 'CFLAGS+=OPTIONS'.
* INSTALL: Document use of CFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
2018-02-06 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.10 released.
* main.c: Added new option '--loose-trailing'.
* Added new option '--loose-trailing'.
* Improved corrupt header detection to HD=3.
* main.c: Show corrupt or truncated header in multimember file.
* main.c (main): Option '-S, --volume-size' now keeps input files.
@ -25,14 +37,14 @@
* Decompression time has been reduced by 7%.
* main.c: Continue testing if any input file is a terminal.
* main.c: Show trailing data in both hexadecimal and ASCII.
* file_index.c: Improve detection of bad dict and trailing data.
* lzip_index.c: Improve detection of bad dict and trailing data.
* lzip.h: Unified messages for bad magic, trailing data, etc.
* clzip.texi: Added missing chapters from lzip.texi.
2016-05-13 Antonio Diaz Diaz <antonio@gnu.org>
* Version 1.8 released.
* main.c: Added new option '-a, --trailing-error'.
* Added new option '-a, --trailing-error'.
* main.c (decompress): Print up to 6 bytes of trailing data
when '-vvvv' is specified.
* decoder.c (LZd_verify_trailer): Removed test of final code.
@ -92,7 +104,7 @@
2011-05-18 Antonio Diaz Diaz <ant_diaz@teleline.es>
* Version 1.2 released.
* main.c: Added new option '-F, --recompress'.
* Added new option '-F, --recompress'.
* main.c (decompress): Print only one status line for each
multimember file when only one '-v' is specified.
* encoder.h (Lee_update_prices): Update high length symbol prices
@ -125,7 +137,7 @@
* Translated to C from the C++ source of lzip 1.10.
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This file is a collection of facts, and thus it is not copyrightable,
but just in case, you have unlimited permission to copy, distribute and

14
INSTALL
View file

@ -1,10 +1,14 @@
Requirements
------------
You will need a C compiler.
I use gcc 5.3.0 and 4.1.2, but the code should compile with any
standards compliant compiler.
I use gcc 5.3.0 and 4.1.2, but the code should compile with any standards
compliant compiler.
Gcc is available at http://gcc.gnu.org.
The operating system must allow signal handlers read access to objects with
static storage duration so that the cleanup handler for Control-C can delete
the partial output file.
Procedure
---------
@ -23,6 +27,10 @@ the main archive.
cd clzip[version]
./configure
If you are compiling on MinGW, use:
./configure CFLAGS+='-D __USE_MINGW_ANSI_STDIO'
3. Run make.
make
@ -62,7 +70,7 @@ After running 'configure', you can run 'make' and 'make install' as
explained above.
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy,
distribute and modify it.

View file

@ -7,7 +7,7 @@ INSTALL_DIR = $(INSTALL) -d -m 755
SHELL = /bin/sh
CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1
objs = carg_parser.o file_index.o list.o encoder_base.o encoder.o \
objs = carg_parser.o lzip_index.o list.o encoder_base.o encoder.o \
fast_encoder.o decoder.o main.o
@ -35,8 +35,8 @@ decoder.o : lzip.h decoder.h
encoder_base.o : lzip.h encoder_base.h
encoder.o : lzip.h encoder_base.h encoder.h
fast_encoder.o : lzip.h encoder_base.h fast_encoder.h
file_index.o : lzip.h file_index.h
list.o : lzip.h file_index.h
list.o : lzip.h lzip_index.h
lzip_index.o : lzip.h lzip_index.h
main.o : carg_parser.h lzip.h decoder.h encoder_base.h encoder.h fast_encoder.h

47
NEWS
View file

@ -1,42 +1,17 @@
Changes in version 1.10:
Changes in version 1.11:
The option '--loose-trailing', has been added.
Detection of forbidden combinations of characters in trailing data has been
improved.
The test used by clzip to discriminate trailing data from a corrupt
header in multimember or concatenated files has been improved to a
Hamming distance (HD) of 3, and the 3 bit flips must happen in different
magic bytes for the test to fail. As a consequence some kinds of files
no longer can be appended to a lzip file as trailing data unless the
'--loose-trailing' option is used when decompressing.
Lziprecover can be used to remove conflicting trailing data from a file.
Errors are now also checked when closing the input file.
The contents of a corrupt or truncated header found in a multimember
file is now shown, after the error message, in the same format as
trailing data.
Clzip now compiles on DOS with DJGPP. (Patch from Robert Riebisch).
Option '-S, --volume-size' now keeps input files unchanged.
The descriptions of '-0..-9', '-m' and '-s' in the manual have been
improved.
When creating multimember files or splitting the output in volumes, the
dictionary size is now adjusted for each member individually.
The configure script now accepts appending options to CFLAGS using the
syntax 'CFLAGS+=OPTIONS'.
The 'bits/byte' ratio has been replaced with the inverse compression
ratio in the output.
The progress of decompression is now shown at verbosity level 2 (-vv) or
higher.
Progress of (de)compression is only shown if stderr is a terminal.
A final diagnostic is now shown at verbosity level 1 (-v) or higher if
any file fails the test when testing multiple files.
A second '.lz' extension is no longer added to the argument of '-o' if
it already ends in '.lz' or '.tlz'.
In case of (de)compressed size mismatch, the stored size is now also
shown in hexadecimal to ease visual comparison.
The dictionary size is now shown at verbosity level 4 (-vvvv) when
decompressing or testing.
The new chapter "Meaning of clzip's output" has been added to the manual.
It has been documented in INSTALL the use of
CFLAGS+='-D __USE_MINGW_ANSI_STDIO' when compiling on MinGW.

58
README
View file

@ -1,32 +1,33 @@
Description
Clzip is a C language version of lzip, fully compatible with lzip-1.4 or
Clzip is a C language version of lzip, fully compatible with lzip 1.4 or
newer. As clzip is written in C, it may be easier to integrate in
applications like package managers, embedded devices, or systems lacking
a C++ compiler.
Lzip is a lossless data compressor with a user interface similar to the
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0),
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0)
or compress most files more than bzip2 (lzip -9). Decompression speed is
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2
from a data recovery perspective.
from a data recovery perspective. Lzip has been designed, written and
tested with great care to replace gzip and bzip2 as the standard
general-purpose compressed format for unix-like systems.
The lzip file format is designed for data sharing and long-term
archiving, taking into account both data integrity and decoder
availability:
The lzip file format is designed for data sharing and long-term archiving,
taking into account both data integrity and decoder availability:
* The lzip format provides very safe integrity checking and some data
recovery means. The lziprecover program can repair bit-flip errors
recovery means. The lziprecover program can repair bit flip errors
(one of the most common forms of data corruption) in lzip files,
and provides data recovery capabilities, including error-checked
merging of damaged copies of a file.
* The lzip format is as simple as possible (but not simpler). The
lzip manual provides the source code of a simple decompressor along
with a detailed explanation of how it works, so that with the only
help of the lzip manual it would be possible for a digital
archaeologist to extract the data from a lzip file long after
quantum computers eventually render LZMA obsolete.
lzip manual provides the source code of a simple decompressor
along with a detailed explanation of how it works, so that with
the only help of the lzip manual it would be possible for a
digital archaeologist to extract the data from a lzip file long
after quantum computers eventually render LZMA obsolete.
* Additionally the lzip reference implementation is copylefted, which
guarantees that it will remain free forever.
@ -36,15 +37,14 @@ repair the nearer it is from the beginning of the file. Therefore, with
the help of lziprecover, losing an entire archive just because of a
corrupt byte near the beginning is a thing of the past.
Clzip uses the same well-defined exit status values used by lzip and
bzip2, which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs like
tar or zutils.
Clzip uses the same well-defined exit status values used by lzip, which
makes it safer than compressors returning ambiguous warning values (like
gzip) when it is used as a back end for other programs like tar or zutils.
Clzip will automatically use the smallest possible dictionary size for
each file without exceeding the given limit. Keep in mind that the
decompression memory requirement is affected at compression time by the
choice of dictionary size limit.
Clzip will automatically use for each file the largest dictionary size
that does not exceed neither the file size nor the limit given. Keep in
mind that the decompression memory requirement is affected at
compression time by the choice of dictionary size limit.
The amount of memory required for compression is about 1 or 2 times the
dictionary size limit (1 if input file size is less than dictionary size
@ -64,22 +64,22 @@ anyothername becomes anyothername.out
(De)compressing a file is much like copying or moving it; therefore clzip
preserves the access and modification dates, permissions, and, when
possible, ownership of the file just as "cp -p" does. (If the user ID or
possible, ownership of the file just as 'cp -p' does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared).
Clzip is able to read from some types of non regular files if the
"--stdout" option is specified.
'--stdout' option is specified.
If no file names are specified, clzip compresses (or decompresses) from
standard input to standard output. In this case, clzip will decline to
write compressed output to a terminal, as this would be entirely
incomprehensible and therefore pointless.
Clzip will correctly decompress a file which is the concatenation of two
or more compressed files. The result is the concatenation of the
corresponding decompressed files. Integrity testing of concatenated
compressed files is also supported.
Clzip will correctly decompress a file which is the concatenation of two or
more compressed files. The result is the concatenation of the corresponding
decompressed files. Integrity testing of concatenated compressed files is
also supported.
Clzip can produce multimember files, and lziprecover can safely recover
the undamaged members in case of file damage. Clzip can also split the
@ -115,8 +115,12 @@ the definition of Markov chains), G.N.N. Martin (for the definition of
range encoding), Igor Pavlov (for putting all the above together in
LZMA), and Julian Seward (for bzip2's CLI).
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
have been compressed. Decompressed is used to refer to data which have
undergone the process of decompression.
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy,
distribute and modify it.

View file

@ -1,5 +1,5 @@
/* Arg_parser - POSIX/GNU command line argument parser. (C version)
Copyright (C) 2006-2018 Antonio Diaz Diaz.
Copyright (C) 2006-2019 Antonio Diaz Diaz.
This library is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided

View file

@ -1,5 +1,5 @@
/* Arg_parser - POSIX/GNU command line argument parser. (C version)
Copyright (C) 2006-2018 Antonio Diaz Diaz.
Copyright (C) 2006-2019 Antonio Diaz Diaz.
This library is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided

16
configure vendored
View file

@ -1,12 +1,12 @@
#! /bin/sh
# configure script for Clzip - LZMA lossless data compressor
# Copyright (C) 2010-2018 Antonio Diaz Diaz.
# Copyright (C) 2010-2019 Antonio Diaz Diaz.
#
# This configure script is free software: you have unlimited permission
# to copy, distribute and modify it.
pkgname=clzip
pkgversion=1.10
pkgversion=1.11
progname=clzip
srctrigger=doc/${pkgname}.texi
@ -70,6 +70,7 @@ while [ $# != 0 ] ; do
echo " CC=COMPILER C compiler to use [${CC}]"
echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]"
echo " CFLAGS=OPTIONS command line options for the C compiler [${CFLAGS}]"
echo " CFLAGS+=OPTIONS append options to the current value of CFLAGS"
echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]"
echo
exit 0 ;;
@ -93,10 +94,11 @@ while [ $# != 0 ] ; do
--mandir=*) mandir=${optarg} ;;
--no-create) no_create=yes ;;
CC=*) CC=${optarg} ;;
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
CFLAGS=*) CFLAGS=${optarg} ;;
LDFLAGS=*) LDFLAGS=${optarg} ;;
CC=*) CC=${optarg} ;;
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
CFLAGS=*) CFLAGS=${optarg} ;;
CFLAGS+=*) CFLAGS="${CFLAGS} ${optarg}" ;;
LDFLAGS=*) LDFLAGS=${optarg} ;;
--*)
echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;;
@ -168,7 +170,7 @@ echo "LDFLAGS = ${LDFLAGS}"
rm -f Makefile
cat > Makefile << EOF
# Makefile for Clzip - LZMA lossless data compressor
# Copyright (C) 2010-2018 Antonio Diaz Diaz.
# Copyright (C) 2010-2019 Antonio Diaz Diaz.
# This file was generated automatically by configure. Don't edit.
#
# This Makefile is free software: you have unlimited permission

142
decoder.c
View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -101,15 +101,15 @@ void LZd_flush_data( struct LZ_decoder * const d )
static bool LZd_verify_trailer( struct LZ_decoder * const d,
struct Pretty_print * const pp )
{
File_trailer trailer;
int size = Rd_read_data( d->rdec, trailer, Ft_size );
Lzip_trailer trailer;
int size = Rd_read_data( d->rdec, trailer, Lt_size );
const unsigned long long data_size = LZd_data_position( d );
const unsigned long long member_size = Rd_member_position( d->rdec );
unsigned td_crc;
unsigned long long td_size, tm_size;
bool error = false;
if( size < Ft_size )
if( size < Lt_size )
{
error = true;
if( verbosity >= 0 )
@ -118,10 +118,10 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
fprintf( stderr, "Trailer truncated at trailer position %d;"
" some checks may fail.\n", size );
}
while( size < Ft_size ) trailer[size++] = 0;
while( size < Lt_size ) trailer[size++] = 0;
}
td_crc = Ft_get_data_crc( trailer );
td_crc = Lt_get_data_crc( trailer );
if( td_crc != LZd_crc( d ) )
{
error = true;
@ -132,7 +132,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
td_crc, LZd_crc( d ) );
}
}
td_size = Ft_get_data_size( trailer );
td_size = Lt_get_data_size( trailer );
if( td_size != data_size )
{
error = true;
@ -143,7 +143,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
td_size, td_size, data_size, data_size );
}
}
tm_size = Ft_get_member_size( trailer );
tm_size = Lt_get_member_size( trailer );
if( tm_size != member_size )
{
error = true;
@ -214,9 +214,11 @@ int LZd_decode_member( struct LZ_decoder * const d,
Rd_load( rdec );
while( !Rd_finished( rdec ) )
{
int len;
const int pos_state = LZd_data_position( d ) & pos_state_mask;
if( Rd_decode_bit( rdec, &bm_match[state][pos_state] ) == 0 ) /* 1st bit */
if( Rd_decode_bit( rdec, &bm_match[state][pos_state] ) == 0 ) /* 1st bit */
{
/* literal byte */
Bit_model * const bm = bm_literal[get_lit_state(LZd_peek_prev( d ))];
if( St_is_char( state ) )
{
@ -228,83 +230,81 @@ int LZd_decode_member( struct LZ_decoder * const d,
state -= ( state < 10 ) ? 3 : 6;
LZd_put_byte( d, Rd_decode_matched( rdec, bm, LZd_peek( d, rep0 ) ) );
}
continue;
}
else /* match or repeated match */
/* match or repeated match */
if( Rd_decode_bit( rdec, &bm_rep[state] ) != 0 ) /* 2nd bit */
{
int len;
if( Rd_decode_bit( rdec, &bm_rep[state] ) != 0 ) /* 2nd bit */
if( Rd_decode_bit( rdec, &bm_rep0[state] ) == 0 ) /* 3rd bit */
{
if( Rd_decode_bit( rdec, &bm_rep0[state] ) == 0 ) /* 3rd bit */
{
if( Rd_decode_bit( rdec, &bm_len[state][pos_state] ) == 0 ) /* 4th bit */
{ state = St_set_short_rep( state );
LZd_put_byte( d, LZd_peek( d, rep0 ) ); continue; }
}
else
{
unsigned distance;
if( Rd_decode_bit( rdec, &bm_rep1[state] ) == 0 ) /* 4th bit */
distance = rep1;
else
{
if( Rd_decode_bit( rdec, &bm_rep2[state] ) == 0 ) /* 5th bit */
distance = rep2;
else
{ distance = rep3; rep3 = rep2; }
rep2 = rep1;
}
rep1 = rep0;
rep0 = distance;
}
state = St_set_rep( state );
len = min_match_len + Rd_decode_len( rdec, &rep_len_model, pos_state );
if( Rd_decode_bit( rdec, &bm_len[state][pos_state] ) == 0 ) /* 4th bit */
{ state = St_set_short_rep( state );
LZd_put_byte( d, LZd_peek( d, rep0 ) ); continue; }
}
else /* match */
else
{
unsigned distance;
len = min_match_len + Rd_decode_len( rdec, &match_len_model, pos_state );
distance = Rd_decode_tree6( rdec, bm_dis_slot[get_len_state(len)] );
if( distance >= start_dis_model )
if( Rd_decode_bit( rdec, &bm_rep1[state] ) == 0 ) /* 4th bit */
distance = rep1;
else
{
const unsigned dis_slot = distance;
const int direct_bits = ( dis_slot >> 1 ) - 1;
distance = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
distance += Rd_decode_tree_reversed( rdec,
bm_dis + ( distance - dis_slot ), direct_bits );
if( Rd_decode_bit( rdec, &bm_rep2[state] ) == 0 ) /* 5th bit */
distance = rep2;
else
{ distance = rep3; rep3 = rep2; }
rep2 = rep1;
}
rep1 = rep0;
rep0 = distance;
}
state = St_set_rep( state );
len = min_match_len + Rd_decode_len( rdec, &rep_len_model, pos_state );
}
else /* match */
{
unsigned distance;
len = min_match_len + Rd_decode_len( rdec, &match_len_model, pos_state );
distance = Rd_decode_tree6( rdec, bm_dis_slot[get_len_state(len)] );
if( distance >= start_dis_model )
{
const unsigned dis_slot = distance;
const int direct_bits = ( dis_slot >> 1 ) - 1;
distance = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
distance += Rd_decode_tree_reversed( rdec,
bm_dis + ( distance - dis_slot ), direct_bits );
else
{
distance +=
Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits;
distance += Rd_decode_tree_reversed4( rdec, bm_align );
if( distance == 0xFFFFFFFFU ) /* marker found */
{
distance +=
Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits;
distance += Rd_decode_tree_reversed4( rdec, bm_align );
if( distance == 0xFFFFFFFFU ) /* marker found */
Rd_normalize( rdec );
LZd_flush_data( d );
if( len == min_match_len ) /* End Of Stream marker */
{
Rd_normalize( rdec );
LZd_flush_data( d );
if( len == min_match_len ) /* End Of Stream marker */
{
if( LZd_verify_trailer( d, pp ) ) return 0; else return 3;
}
if( len == min_match_len + 1 ) /* Sync Flush marker */
{
Rd_load( rdec ); continue;
}
if( verbosity >= 0 )
{
Pp_show_msg( pp, 0 );
fprintf( stderr, "Unsupported marker code '%d'\n", len );
}
return 4;
if( LZd_verify_trailer( d, pp ) ) return 0; else return 3;
}
if( len == min_match_len + 1 ) /* Sync Flush marker */
{
Rd_load( rdec ); continue;
}
if( verbosity >= 0 )
{
Pp_show_msg( pp, 0 );
fprintf( stderr, "Unsupported marker code '%d'\n", len );
}
return 4;
}
}
rep3 = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance;
state = St_set_match( state );
if( rep0 >= d->dictionary_size || ( rep0 >= d->pos && !d->pos_wrapped ) )
{ LZd_flush_data( d ); return 1; }
}
LZd_copy_block( d, rep0, len );
rep3 = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance;
state = St_set_match( state );
if( rep0 >= d->dictionary_size || ( rep0 >= d->pos && !d->pos_wrapped ) )
{ LZd_flush_data( d ); return 1; }
}
LZd_copy_block( d, rep0, len );
}
LZd_flush_data( d );
return 2;

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View file

@ -1,12 +1,23 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
.TH CLZIP "1" "February 2018" "clzip 1.10" "User Commands"
.TH CLZIP "1" "January 2019" "clzip 1.11" "User Commands"
.SH NAME
clzip \- reduces the size of files
.SH SYNOPSIS
.B clzip
[\fI\,options\/\fR] [\fI\,files\/\fR]
.SH DESCRIPTION
Clzip \- LZMA lossless data compressor.
Clzip is a C language version of lzip, fully compatible with lzip 1.4 or
newer. As clzip is written in C, it may be easier to integrate in
applications like package managers, embedded devices, or systems lacking
a C++ compiler.
.PP
Lzip is a lossless data compressor with a user interface similar to the
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip \fB\-0\fR)
or compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2
from a data recovery perspective. Lzip has been designed, written and
tested with great care to replace gzip and bzip2 as the standard
general\-purpose compressed format for unix\-like systems.
.SH OPTIONS
.TP
\fB\-h\fR, \fB\-\-help\fR
@ -52,7 +63,7 @@ suppress all messages
set dictionary size limit in bytes [8 MiB]
.TP
\fB\-S\fR, \fB\-\-volume\-size=\fR<bytes>
set volume size limit in bytes, implies \fB\-k\fR
set volume size limit in bytes
.TP
\fB\-t\fR, \fB\-\-test\fR
test compressed file integrity
@ -93,7 +104,7 @@ Report bugs to lzip\-bug@nongnu.org
.br
Clzip home page: http://www.nongnu.org/lzip/clzip.html
.SH COPYRIGHT
Copyright \(co 2018 Antonio Diaz Diaz.
Copyright \(co 2019 Antonio Diaz Diaz.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br
This is free software: you are free to change and redistribute it.

View file

@ -11,7 +11,7 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir)
Clzip Manual
************
This manual is for Clzip (version 1.10, 6 February 2018).
This manual is for Clzip (version 1.11, 3 January 2019).
* Menu:
@ -29,7 +29,7 @@ This manual is for Clzip (version 1.10, 6 February 2018).
* Concept index:: Index of concepts
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute and modify it.
@ -40,14 +40,14 @@ File: clzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
1 Introduction
**************
Clzip is a C language version of lzip, fully compatible with lzip-1.4 or
newer. As clzip is written in C, it may be easier to integrate in
applications like package managers, embedded devices, or systems lacking
a C++ compiler.
Clzip is a C language version of lzip, fully compatible with lzip 1.4
or newer. As clzip is written in C, it may be easier to integrate in
applications like package managers, embedded devices, or systems
lacking a C++ compiler.
Lzip is a lossless data compressor with a user interface similar to
the one of gzip or bzip2. Lzip can compress about as fast as gzip
(lzip -0), or compress most files more than bzip2 (lzip -9).
(lzip -0) or compress most files more than bzip2 (lzip -9).
Decompression speed is intermediate between gzip and bzip2. Lzip is
better than gzip and bzip2 from a data recovery perspective.
@ -88,15 +88,15 @@ microscopic. Be aware, though, that the check occurs upon
decompression, so it can only tell you that something is wrong. It
can't help you recover the original uncompressed data.
Clzip uses the same well-defined exit status values used by lzip and
bzip2, which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs like
tar or zutils.
Clzip uses the same well-defined exit status values used by lzip,
which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs
like tar or zutils.
Clzip will automatically use the smallest possible dictionary size
for each file without exceeding the given limit. Keep in mind that the
decompression memory requirement is affected at compression time by the
choice of dictionary size limit.
Clzip will automatically use for each file the largest dictionary
size that does not exceed neither the file size nor the limit given.
Keep in mind that the decompression memory requirement is affected at
compression time by the choice of dictionary size limit.
The amount of memory required for compression is about 1 or 2 times
the dictionary size limit (1 if input file size is less than dictionary
@ -116,7 +116,7 @@ anyothername becomes anyothername.out
(De)compressing a file is much like copying or moving it; therefore
clzip preserves the access and modification dates, permissions, and,
when possible, ownership of the file just as "cp -p" does. (If the user
when possible, ownership of the file just as 'cp -p' does. (If the user
ID or the group ID can't be duplicated, the file permission bits
S_ISUID and S_ISGID are cleared).
@ -214,6 +214,7 @@ command line.
'-V'
'--version'
Print the version number of clzip on the standard output and exit.
This version number should be included in all bug reports.
'-a'
'--trailing-error'
@ -298,12 +299,14 @@ command line.
'-s BYTES'
'--dictionary-size=BYTES'
When compressing, set the dictionary size limit in bytes. Clzip
will use the smallest possible dictionary size for each file
without exceeding this limit. Valid values range from 4 KiB to
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
the specified size does not match one of the valid sizes, it will
be rounded upwards by adding up to (BYTES / 8) to it.
will use for each file the largest dictionary size that does not
exceed neither the file size nor this limit. Valid values range
from 4 KiB to 512 MiB. Values 12 to 29 are interpreted as powers
of two, meaning 2^12 to 2^29 bytes. Dictionary sizes are quantized
so that they can be coded in just one byte (*note
coded-dict-size::). If the specified size does not match one of
the valid sizes, it will be rounded upwards by adding up to
(BYTES / 8) to it.
For maximum compression you should use a dictionary size limit as
large as possible, but keep in mind that the decompression memory
@ -342,27 +345,32 @@ command line.
Two or more '-v' options show the progress of (de)compression.
'-0 .. -9'
Set the compression parameters (dictionary size and match length
limit) as shown in the table below. The default compression level
is '-6'. Note that '-9' can be much slower than '-0'. These
options have no effect when decompressing, testing or listing.
Compression level. Set the compression parameters (dictionary size
and match length limit) as shown in the table below. The default
compression level is '-6', equivalent to '-s8MiB -m36'. Note that
'-9' can be much slower than '-0'. These options have no effect
when decompressing, testing or listing.
The bidimensional parameter space of LZMA can't be mapped to a
linear scale optimal for all files. If your files are large, very
repetitive, etc, you may need to use the '--dictionary-size' and
'--match-length' options directly to achieve optimal performance.
Level Dictionary size Match length limit
-0 64 KiB 16 bytes
-1 1 MiB 5 bytes
-2 1.5 MiB 6 bytes
-3 2 MiB 8 bytes
-4 3 MiB 12 bytes
-5 4 MiB 20 bytes
-6 8 MiB 36 bytes
-7 16 MiB 68 bytes
-8 24 MiB 132 bytes
-9 32 MiB 273 bytes
If several compression levels or '-s' or '-m' options are given,
the last setting is used. For example '-9 -s64MiB' is equivalent
to '-s64MiB -m273'
Level Dictionary size (-s) Match length limit (-m)
-0 64 KiB 16 bytes
-1 1 MiB 5 bytes
-2 1.5 MiB 6 bytes
-3 2 MiB 8 bytes
-4 3 MiB 12 bytes
-5 4 MiB 20 bytes
-6 8 MiB 36 bytes
-7 16 MiB 68 bytes
-8 24 MiB 132 bytes
-9 32 MiB 273 bytes
'--fast'
'--best'
@ -409,10 +417,10 @@ is to make it so complicated that there are no obvious deficiencies. The
first method is far more difficult.
-- C.A.R. Hoare
Lzip has been designed, written and tested with great care to be the
standard general-purpose compressor for unix-like systems. This chapter
describes the lessons learned from previous compressors (gzip and
bzip2), and their application to the design of lzip.
Lzip has been designed, written and tested with great care to replace
gzip and bzip2 as the standard general-purpose compressed format for
unix-like systems. This chapter describes the lessons learned from
these previous formats, and their application to the design of lzip.
4.1 Format design
@ -455,17 +463,20 @@ error detection. Any distance larger than the dictionary size acts as a
forbidden symbol, allowing the decompressor to detect the approximate
position of errors, and leaving very little work for the check sequence
(CRC and data sizes) in the detection of errors. Lzip is usually able
to detect all posible bit flips in the compressed data without
to detect all possible bit flips in the compressed data without
resorting to the check sequence. It would be difficult to write an
automatic recovery tool like lziprecover for the gzip format. And, as
far as I know, it has never been written.
Lzip, like gzip and bzip2, uses a CRC32 to check the integrity of the
decompressed data because it provides more accurate error detection than
CRC64 up to a compressed size of about 16 GiB, a size larger than that
of most files. In the case of lzip, the additional detection capability
of the decompressor reduces the probability of undetected errors more
than a million times beyond what the CRC32 alone provides.
decompressed data because it provides optimal accuracy in the detection
of errors up to a compressed size of about 16 GiB, a size larger than
that of most files. In the case of lzip, the additional detection
capability of the decompressor reduces the probability of undetected
errors about four million times more, resulting in a combined integrity
checking optimally accurate for any member size produced by lzip.
Preliminary results suggest that the lzip format is safe enough to be
used in critical safety avionics systems.
The lzip format is designed for long-term archiving. Therefore it
excludes any unneeded features that may interfere with the future
@ -520,7 +531,7 @@ extraction of the decompressed data.
Bzip2 does not store the uncompressed size of the file.
The lzip format provides a 64-bit field for the uncompressed size.
Additionaly, lzip produces multimember output automatically when
Additionally, lzip produces multimember output automatically when
the size is too large for a single member, allowing for an
unlimited uncompressed size.
@ -568,9 +579,9 @@ extraction of the decompressed data.
(lziprecover)Unzcrash.
'Dictionary size'
Lzip automatically uses the smallest possible dictionary size for
each file. In addition to reducing the amount of memory required
for decompression, this feature also minimizes the probability of
Lzip automatically adapts the dictionary size to the size of each
file. In addition to reducing the amount of memory required for
decompression, this feature also minimizes the probability of
being affected by RAM errors during compression.
'Exit status'
@ -624,11 +635,11 @@ additional information before, between, or after them.
'DS (coded dictionary size, 1 byte)'
The dictionary size is calculated by taking a power of 2 (the base
size) and substracting from it a fraction between 0/16 and 7/16 of
size) and subtracting from it a fraction between 0/16 and 7/16 of
the base size.
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
Bits 7-5 contain the numerator of the fraction (0 to 7) to
substract from the base size to obtain the dictionary size.
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
from the base size to obtain the dictionary size.
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
Valid values for dictionary size range from 4 KiB to 512 MiB.
@ -767,7 +778,7 @@ reusing a recently used distance). There are 7 different coding
sequences:
Bit sequence Name Description
---------------------------------------------------------------------------
------------------------------------------------------------------------
0 + byte literal literal byte
1 + 0 + len + dis match distance-length pair
1 + 1 + 0 + 0 shortrep 1 byte match at latest used distance
@ -787,7 +798,7 @@ order, from MSB to LSB, except where noted otherwise.
Lengths (the 'len' in the table above) are coded as follows:
Bit sequence Description
--------------------------------------------------------------------------
------------------------------------------------------------------------
0 + 3 bits lengths from 2 to 9
1 + 0 + 3 bits lengths from 10 to 17
1 + 1 + 8 bits lengths from 18 to 273
@ -828,7 +839,7 @@ order (from LSB to MSB). For distances >= 128, the 'direct_bits - 4'
part is coded with fixed 0.5 probability.
Bit sequence Description
--------------------------------------------------------------------------
------------------------------------------------------------------------
slot distances from 0 to 3
slot + direct_bits distances from 4 to 127
slot + (direct_bits - 4) + 4 bits distances from 128 to 2^32 - 1
@ -864,7 +875,7 @@ byte. 'rep' is any one of 'rep0', 'rep1', 'rep2' or 'rep3'. The types
of previous sequences corresponding to each state are:
State Types of previous sequences
--------------------------------------------------------
------------------------------------------------------
0 literal, literal, literal
1 match, literal, literal
2 rep or (!literal, shortrep), literal, literal
@ -881,24 +892,24 @@ State Types of previous sequences
The contexts for decoding the type of coding sequence are:
Name Indices Used when
---------------------------------------------------------------------------
bm_match state, pos_state sequence start
bm_rep state after sequence 1
bm_rep0 state after sequence 11
bm_rep1 state after sequence 111
bm_rep2 state after sequence 1111
bm_len state, pos_state after sequence 110
Name Indices Used when
-----------------------------------------------------------------------
bm_match state, pos_state sequence start
bm_rep state after sequence 1
bm_rep0 state after sequence 11
bm_rep1 state after sequence 111
bm_rep2 state after sequence 1111
bm_len state, pos_state after sequence 110
The contexts for decoding distances are:
Name Indices Used when
---------------------------------------------------------------------------
bm_dis_slot len_state, bit tree distance start
bm_dis reverse bit tree after slots 4 to 13
bm_align reverse bit tree for distances >= 128, after
fixed probability bits
Name Indices Used when
------------------------------------------------------------------------
bm_dis_slot len_state, bit tree distance start
bm_dis reverse bit tree after slots 4 to 13
bm_align reverse bit tree for distances >= 128, after fixed
probability bits
There are two separate sets of contexts for lengths ('Len_model' in
@ -906,7 +917,7 @@ the source). One for normal matches, the other for repeated matches. The
contexts in each Len_model are (see 'decode_len' in the source):
Name Indices Used when
---------------------------------------------------------------------------
------------------------------------------------------------------------
choice1 none length start
choice2 none after sequence 1
bm_low pos_state, bit tree after sequence 0
@ -1013,7 +1024,11 @@ compressed file (bugs in the system libraries, memory errors, etc).
Therefore, if the data you are going to compress are important, give the
'--keep' option to clzip and don't remove the original file until you
verify the compressed file with a command like
'clzip -cd file.lz | cmp file -'.
'clzip -cd file.lz | cmp file -'. Most RAM errors happening during
compression can only be detected by comparing the compressed file with
the original because the corruption happens before clzip compresses the
RAM contents, resulting in a valid compressed file containing wrong
data.
Example 1: Replace a regular file with its compressed version 'file.lz'
@ -1106,7 +1121,7 @@ Appendix A Reference source code
********************************
/* Lzd - Educational decompressor for the lzip format
Copyright (C) 2013-2018 Antonio Diaz Diaz.
Copyright (C) 2013-2019 Antonio Diaz Diaz.
This program is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided
@ -1136,7 +1151,7 @@ Appendix A Reference source code
#include <cstring>
#include <stdint.h>
#include <unistd.h>
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
#include <fcntl.h>
#include <io.h>
#endif
@ -1237,9 +1252,9 @@ public:
const CRC32 crc32;
typedef uint8_t File_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
typedef uint8_t Lzip_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
typedef uint8_t File_trailer[20];
typedef uint8_t Lzip_trailer[20];
// 0-3 CRC32 of the uncompressed data
// 4-11 size of the uncompressed data
// 12-19 member size including header and trailer
@ -1433,6 +1448,7 @@ bool LZ_decoder::decode_member() // Returns false if error
const int pos_state = data_position() & pos_state_mask;
if( rdec.decode_bit( bm_match[state()][pos_state] ) == 0 ) // 1st bit
{
// literal byte
const uint8_t prev_byte = peek( 0 );
const int literal_state = prev_byte >> ( 8 - literal_context_bits );
Bit_model * const bm = bm_literal[literal_state];
@ -1441,67 +1457,66 @@ bool LZ_decoder::decode_member() // Returns false if error
else
put_byte( rdec.decode_matched( bm, peek( rep0 ) ) );
state.set_char();
continue;
}
else // match or repeated match
// match or repeated match
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
{
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
{
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
{
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
}
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
}
else
{
unsigned distance;
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
distance = rep1;
else
{
unsigned distance;
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
distance = rep1;
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
distance = rep2;
else
{
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
distance = rep2;
else
{ distance = rep3; rep3 = rep2; }
rep2 = rep1;
}
rep1 = rep0;
rep0 = distance;
{ distance = rep3; rep3 = rep2; }
rep2 = rep1;
}
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
rep1 = rep0;
rep0 = distance;
}
else // match
{
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
const int len_state = std::min( len - min_match_len, len_states - 1 );
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
if( rep0 >= start_dis_model )
{
const unsigned dis_slot = rep0;
const int direct_bits = ( dis_slot >> 1 ) - 1;
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
direct_bits );
else
{
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
if( rep0 == 0xFFFFFFFFU ) // marker found
{
flush_data();
return ( len == min_match_len ); // End Of Stream marker
}
}
}
state.set_match();
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
{ flush_data(); return false; }
}
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
}
else // match
{
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
const int len_state = std::min( len - min_match_len, len_states - 1 );
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
if( rep0 >= start_dis_model )
{
const unsigned dis_slot = rep0;
const int direct_bits = ( dis_slot >> 1 ) - 1;
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
direct_bits );
else
{
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
if( rep0 == 0xFFFFFFFFU ) // marker found
{
flush_data();
return ( len == min_match_len ); // End Of Stream marker
}
}
}
state.set_match();
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
{ flush_data(); return false; }
}
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
}
flush_data();
return false;
@ -1519,7 +1534,7 @@ int main( const int argc, const char * const argv[] )
"It is not safe to use lzd for any real work.\n"
"\nUsage: %s < file.lz > file\n", argv[0] );
std::printf( "Lzd decompresses from standard input to standard output.\n"
"\nCopyright (C) 2018 Antonio Diaz Diaz.\n"
"\nCopyright (C) 2019 Antonio Diaz Diaz.\n"
"This is free software: you are free to change and redistribute it.\n"
"There is NO WARRANTY, to the extent permitted by law.\n"
"Report bugs to lzip-bug@nongnu.org\n"
@ -1527,14 +1542,14 @@ int main( const int argc, const char * const argv[] )
return 0;
}
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
setmode( fileno( stdin ), O_BINARY );
setmode( fileno( stdout ), O_BINARY );
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
setmode( STDIN_FILENO, O_BINARY );
setmode( STDOUT_FILENO, O_BINARY );
#endif
for( bool first_member = true; ; first_member = false )
{
File_header header; // verify header
Lzip_header header; // verify header
for( int i = 0; i < 6; ++i ) header[i] = std::getc( stdin );
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
{
@ -1553,7 +1568,7 @@ int main( const int argc, const char * const argv[] )
if( !decoder.decode_member() )
{ std::fputs( "Data error\n", stderr ); return 2; }
File_trailer trailer; // verify trailer
Lzip_trailer trailer; // verify trailer
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
unsigned crc = 0;
for( int i = 3; i >= 0; --i ) { crc <<= 8; crc += trailer[i]; }
@ -1598,20 +1613,21 @@ Concept index

Tag Table:
Node: Top210
Node: Introduction1210
Node: Output6491
Node: Invoking clzip8011
Ref: --trailing-error8577
Node: Quality assurance16230
Node: File format24640
Node: Algorithm27045
Node: Stream format29875
Node: Trailing data40616
Node: Examples42894
Ref: concat-example44076
Node: Problems45121
Node: Reference source code45657
Node: Concept index59974
Node: Introduction1209
Node: Output6498
Node: Invoking clzip8018
Ref: --trailing-error8648
Node: Quality assurance16666
Node: File format25271
Ref: coded-dict-size26564
Node: Algorithm27674
Node: Stream format30504
Node: Trailing data41156
Node: Examples43434
Ref: concat-example44866
Node: Problems45911
Node: Reference source code46447
Node: Concept index60660

End Tag Table

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 6 February 2018
@set VERSION 1.10
@set UPDATED 3 January 2019
@set VERSION 1.11
@dircategory Data Compression
@direntry
@ -50,7 +50,7 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
Copyright @copyright{} 2010-2018 Antonio Diaz Diaz.
Copyright @copyright{} 2010-2019 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission
to copy, distribute and modify it.
@ -60,20 +60,20 @@ to copy, distribute and modify it.
@chapter Introduction
@cindex introduction
Clzip is a C language version of lzip, fully compatible with lzip-1.4 or
newer. As clzip is written in C, it may be easier to integrate in
applications like package managers, embedded devices, or systems lacking
a C++ compiler.
@uref{http://www.nongnu.org/lzip/clzip.html,,Clzip} is a C language version
of lzip, fully compatible with @w{lzip 1.4} or newer. As clzip is written in
C, it may be easier to integrate in applications like package managers,
embedded devices, or systems lacking a C++ compiler.
Lzip is a lossless data compressor with a user interface similar to the
one of gzip or bzip2. Lzip can compress about as fast as gzip
@w{(lzip -0)}, or compress most files more than bzip2 @w{(lzip -9)}.
Decompression speed is intermediate between gzip and bzip2. Lzip is
better than gzip and bzip2 from a data recovery perspective.
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} is a lossless data
compressor with a user interface similar to the one of gzip or bzip2. Lzip
can compress about as fast as gzip @w{(lzip -0)} or compress most files more
than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between gzip
and bzip2. Lzip is better than gzip and bzip2 from a data recovery
perspective.
The lzip file format is designed for data sharing and long-term
archiving, taking into account both data integrity and decoder
availability:
The lzip file format is designed for data sharing and long-term archiving,
taking into account both data integrity and decoder availability:
@itemize @bullet
@item
@ -116,15 +116,14 @@ though, that the check occurs upon decompression, so it can only tell
you that something is wrong. It can't help you recover the original
uncompressed data.
Clzip uses the same well-defined exit status values used by lzip and
bzip2, which makes it safer than compressors returning ambiguous warning
values (like gzip) when it is used as a back end for other programs like
tar or zutils.
Clzip uses the same well-defined exit status values used by lzip, which
makes it safer than compressors returning ambiguous warning values (like
gzip) when it is used as a back end for other programs like tar or zutils.
Clzip will automatically use the smallest possible dictionary size for
each file without exceeding the given limit. Keep in mind that the
decompression memory requirement is affected at compression time by the
choice of dictionary size limit.
Clzip will automatically use for each file the largest dictionary size
that does not exceed neither the file size nor the limit given. Keep in
mind that the decompression memory requirement is affected at
compression time by the choice of dictionary size limit.
The amount of memory required for compression is about 1 or 2 times the
dictionary size limit (1 if input file size is less than dictionary size
@ -146,7 +145,7 @@ file from that of the compressed file as follows:
(De)compressing a file is much like copying or moving it; therefore clzip
preserves the access and modification dates, permissions, and, when
possible, ownership of the file just as "cp -p" does. (If the user ID or
possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared).
@ -252,6 +251,7 @@ Print an informative help message describing the options and exit.
@item -V
@itemx --version
Print the version number of clzip on the standard output and exit.
This version number should be included in all bug reports.
@anchor{--trailing-error}
@item -a
@ -333,12 +333,13 @@ Quiet operation. Suppress all messages.
@item -s @var{bytes}
@itemx --dictionary-size=@var{bytes}
When compressing, set the dictionary size limit in bytes. Clzip will use
the smallest possible dictionary size for each file without exceeding
this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
that dictionary sizes are quantized. If the specified size does not
match one of the valid sizes, it will be rounded upwards by adding up to
@w{(@var{bytes} / 8)} to it.
for each file the largest dictionary size that does not exceed neither
the file size nor this limit. Valid values range from @w{4 KiB} to
@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
coded in just one byte (@pxref{coded-dict-size}). If the specified size
does not match one of the valid sizes, it will be rounded upwards by
adding up to @w{(@var{bytes} / 8)} to it.
For maximum compression you should use a dictionary size limit as large
as possible, but keep in mind that the decompression memory requirement
@ -376,18 +377,23 @@ ASCII characters.@*
Two or more @samp{-v} options show the progress of (de)compression.
@item -0 .. -9
Set the compression parameters (dictionary size and match length limit)
as shown in the table below. The default compression level is @samp{-6}.
Note that @samp{-9} can be much slower than @samp{-0}. These options
have no effect when decompressing, testing or listing.
Compression level. Set the compression parameters (dictionary size and
match length limit) as shown in the table below. The default compression
level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that
@samp{-9} can be much slower than @samp{-0}. These options have no
effect when decompressing, testing or listing.
The bidimensional parameter space of LZMA can't be mapped to a linear
scale optimal for all files. If your files are large, very repetitive,
etc, you may need to use the @samp{--dictionary-size} and
@samp{--match-length} options directly to achieve optimal performance.
@multitable {Level} {Dictionary size} {Match length limit}
@item Level @tab Dictionary size @tab Match length limit
If several compression levels or @samp{-s} or @samp{-m} options are
given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is
equivalent to @w{@samp{-s64MiB -m273}}
@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)}
@item Level @tab Dictionary size (-s) @tab Match length limit (-m)
@item -0 @tab 64 KiB @tab 16 bytes
@item -1 @tab 1 MiB @tab 5 bytes
@item -2 @tab 1.5 MiB @tab 6 bytes
@ -446,10 +452,10 @@ is to make it so complicated that there are no obvious deficiencies. The
first method is far more difficult.@*
--- C.A.R. Hoare
Lzip has been designed, written and tested with great care to be the
standard general-purpose compressor for unix-like systems. This chapter
describes the lessons learned from previous compressors (gzip and
bzip2), and their application to the design of lzip.
Lzip has been designed, written and tested with great care to replace
gzip and bzip2 as the standard general-purpose compressed format for
unix-like systems. This chapter describes the lessons learned from
these previous formats, and their application to the design of lzip.
@sp 1
@section Format design
@ -489,18 +495,21 @@ is extraordinarily safe. It provides embedded error detection. Any
distance larger than the dictionary size acts as a forbidden symbol,
allowing the decompressor to detect the approximate position of errors,
and leaving very little work for the check sequence (CRC and data sizes)
in the detection of errors. Lzip is usually able to detect all posible
in the detection of errors. Lzip is usually able to detect all possible
bit flips in the compressed data without resorting to the check
sequence. It would be difficult to write an automatic recovery tool like
lziprecover for the gzip format. And, as far as I know, it has never
been written.
Lzip, like gzip and bzip2, uses a CRC32 to check the integrity of the
decompressed data because it provides more accurate error detection than
CRC64 up to a compressed size of about @w{16 GiB}, a size larger than
that of most files. In the case of lzip, the additional detection
decompressed data because it provides optimal accuracy in the detection
of errors up to a compressed size of about @w{16 GiB}, a size larger
than that of most files. In the case of lzip, the additional detection
capability of the decompressor reduces the probability of undetected
errors more than a million times beyond what the CRC32 alone provides.
errors about four million times more, resulting in a combined integrity
checking optimally accurate for any member size produced by lzip.
Preliminary results suggest that the lzip format is safe enough to be
used in critical safety avionics systems.
The lzip format is designed for long-term archiving. Therefore it
excludes any unneeded features that may interfere with the future
@ -559,7 +568,7 @@ size. The size of any file larger than @w{4 GiB} gets truncated.
Bzip2 does not store the uncompressed size of the file.
The lzip format provides a 64-bit field for the uncompressed size.
Additionaly, lzip produces multimember output automatically when the
Additionally, lzip produces multimember output automatically when the
size is too large for a single member, allowing for an unlimited
uncompressed size.
@ -614,10 +623,10 @@ vulnerability or false negative.
@item Dictionary size
Lzip automatically uses the smallest possible dictionary size for each
file. In addition to reducing the amount of memory required for
decompression, this feature also minimizes the probability of being
affected by RAM errors during compression.
Lzip automatically adapts the dictionary size to the size of each file.
In addition to reducing the amount of memory required for decompression,
this feature also minimizes the probability of being affected by RAM
errors during compression. @c key4_mask
@item Exit status
@ -674,12 +683,13 @@ A four byte string, identifying the lzip format, with the value "LZIP"
@item VN (version number, 1 byte)
Just in case something needs to be modified in the future. 1 for now.
@anchor{coded-dict-size}
@item DS (coded dictionary size, 1 byte)
The dictionary size is calculated by taking a power of 2 (the base size)
and substracting from it a fraction between 0/16 and 7/16 of the base
and subtracting from it a fraction between 0/16 and 7/16 of the base
size.@*
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
Bits 7-5 contain the numerator of the fraction (0 to 7) to substract
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
from the base size to obtain the dictionary size.@*
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
Valid values for dictionary size range from 4 KiB to 512 MiB.
@ -939,7 +949,7 @@ are:
@sp 1
The contexts for decoding the type of coding sequence are:
@multitable @columnfractions .2 .4 .4
@multitable @columnfractions .2 .35 .45
@headitem Name @tab Indices @tab Used when
@item bm_match @tab state, pos_state @tab sequence start
@item bm_rep @tab state @tab after sequence 1
@ -952,7 +962,7 @@ The contexts for decoding the type of coding sequence are:
@sp 1
The contexts for decoding distances are:
@multitable @columnfractions .2 .4 .4
@multitable @columnfractions .2 .3 .5
@headitem Name @tab Indices @tab Used when
@item bm_dis_slot @tab len_state, bit tree @tab distance start
@item bm_dis @tab reverse bit tree @tab after slots 4 to 13
@ -1073,9 +1083,12 @@ where a file containing trailing data must be rejected, the option
WARNING! Even if clzip is bug-free, other causes may result in a corrupt
compressed file (bugs in the system libraries, memory errors, etc).
Therefore, if the data you are going to compress are important, give the
@samp{--keep} option to clzip and don't remove the original file until
you verify the compressed file with a command like
@w{@samp{clzip -cd file.lz | cmp file -}}.
@samp{--keep} option to clzip and don't remove the original file until you
verify the compressed file with a command like
@w{@samp{clzip -cd file.lz | cmp file -}}. Most RAM errors happening during
compression can only be detected by comparing the compressed file with the
original because the corruption happens before clzip compresses the RAM
contents, resulting in a valid compressed file containing wrong data.
@sp 1
@noindent
@ -1203,7 +1216,7 @@ find by running @w{@code{clzip --version}}.
@verbatim
/* Lzd - Educational decompressor for the lzip format
Copyright (C) 2013-2018 Antonio Diaz Diaz.
Copyright (C) 2013-2019 Antonio Diaz Diaz.
This program is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided
@ -1233,7 +1246,7 @@ find by running @w{@code{clzip --version}}.
#include <cstring>
#include <stdint.h>
#include <unistd.h>
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
#include <fcntl.h>
#include <io.h>
#endif
@ -1334,9 +1347,9 @@ public:
const CRC32 crc32;
typedef uint8_t File_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
typedef uint8_t Lzip_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
typedef uint8_t File_trailer[20];
typedef uint8_t Lzip_trailer[20];
// 0-3 CRC32 of the uncompressed data
// 4-11 size of the uncompressed data
// 12-19 member size including header and trailer
@ -1530,6 +1543,7 @@ bool LZ_decoder::decode_member() // Returns false if error
const int pos_state = data_position() & pos_state_mask;
if( rdec.decode_bit( bm_match[state()][pos_state] ) == 0 ) // 1st bit
{
// literal byte
const uint8_t prev_byte = peek( 0 );
const int literal_state = prev_byte >> ( 8 - literal_context_bits );
Bit_model * const bm = bm_literal[literal_state];
@ -1538,67 +1552,66 @@ bool LZ_decoder::decode_member() // Returns false if error
else
put_byte( rdec.decode_matched( bm, peek( rep0 ) ) );
state.set_char();
continue;
}
else // match or repeated match
// match or repeated match
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
{
int len;
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
{
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
{
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
}
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
}
else
{
unsigned distance;
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
distance = rep1;
else
{
unsigned distance;
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
distance = rep1;
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
distance = rep2;
else
{
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
distance = rep2;
else
{ distance = rep3; rep3 = rep2; }
rep2 = rep1;
}
rep1 = rep0;
rep0 = distance;
{ distance = rep3; rep3 = rep2; }
rep2 = rep1;
}
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
rep1 = rep0;
rep0 = distance;
}
else // match
{
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
const int len_state = std::min( len - min_match_len, len_states - 1 );
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
if( rep0 >= start_dis_model )
{
const unsigned dis_slot = rep0;
const int direct_bits = ( dis_slot >> 1 ) - 1;
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
direct_bits );
else
{
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
if( rep0 == 0xFFFFFFFFU ) // marker found
{
flush_data();
return ( len == min_match_len ); // End Of Stream marker
}
}
}
state.set_match();
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
{ flush_data(); return false; }
}
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
state.set_rep();
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
}
else // match
{
rep3 = rep2; rep2 = rep1; rep1 = rep0;
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
const int len_state = std::min( len - min_match_len, len_states - 1 );
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
if( rep0 >= start_dis_model )
{
const unsigned dis_slot = rep0;
const int direct_bits = ( dis_slot >> 1 ) - 1;
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
if( dis_slot < end_dis_model )
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
direct_bits );
else
{
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
if( rep0 == 0xFFFFFFFFU ) // marker found
{
flush_data();
return ( len == min_match_len ); // End Of Stream marker
}
}
}
state.set_match();
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
{ flush_data(); return false; }
}
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
}
flush_data();
return false;
@ -1616,7 +1629,7 @@ int main( const int argc, const char * const argv[] )
"It is not safe to use lzd for any real work.\n"
"\nUsage: %s < file.lz > file\n", argv[0] );
std::printf( "Lzd decompresses from standard input to standard output.\n"
"\nCopyright (C) 2018 Antonio Diaz Diaz.\n"
"\nCopyright (C) 2019 Antonio Diaz Diaz.\n"
"This is free software: you are free to change and redistribute it.\n"
"There is NO WARRANTY, to the extent permitted by law.\n"
"Report bugs to lzip-bug@nongnu.org\n"
@ -1624,14 +1637,14 @@ int main( const int argc, const char * const argv[] )
return 0;
}
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
setmode( fileno( stdin ), O_BINARY );
setmode( fileno( stdout ), O_BINARY );
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
setmode( STDIN_FILENO, O_BINARY );
setmode( STDOUT_FILENO, O_BINARY );
#endif
for( bool first_member = true; ; first_member = false )
{
File_header header; // verify header
Lzip_header header; // verify header
for( int i = 0; i < 6; ++i ) header[i] = std::getc( stdin );
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
{
@ -1650,7 +1663,7 @@ int main( const int argc, const char * const argv[] )
if( !decoder.decode_member() )
{ std::fputs( "Data error\n", stderr ); return 2; }
File_trailer trailer; // verify trailer
Lzip_trailer trailer; // verify trailer
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
unsigned crc = 0;
for( int i = 3; i >= 0; --i ) { crc <<= 8; crc += trailer[i]; }

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -325,7 +325,7 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
--prev_index;
else /* prev_index2 >= 0 */
prev_index = prev_index2;
cur_state = 8; /* St_set_char_rep(); */
cur_state = St_set_char_rep();
}
cur_trial->state = cur_state;
for( i = 0; i < num_rep_distances; ++i )
@ -496,7 +496,7 @@ bool LZe_encode_member( struct LZ_encoder * const e,
const unsigned long long member_size )
{
const unsigned long long member_size_limit =
member_size - Ft_size - max_marker_size;
member_size - Lt_size - max_marker_size;
const bool best = ( e->match_len_limit > 12 );
const int dis_price_count = best ? 1 : 512;
const int align_price_count = best ? 1 : dis_align_size;
@ -510,7 +510,7 @@ bool LZe_encode_member( struct LZ_encoder * const e,
for( i = 0; i < num_rep_distances; ++i ) reps[i] = 0;
if( Mb_data_position( &e->eb.mb ) != 0 ||
Re_member_position( &e->eb.renc ) != Fh_size )
Re_member_position( &e->eb.renc ) != Lh_size )
return false; /* can be called only once */
if( !Mb_data_finished( &e->eb.mb ) ) /* encode first byte */

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -54,7 +54,8 @@ void Mb_normalize_pos( struct Matchfinder_base * const mb )
if( !mb->at_stream_end )
{
int i;
const int offset = mb->pos - mb->before_size - mb->dictionary_size;
/* offset is int32_t for the min below */
const int32_t offset = mb->pos - mb->before_size - mb->dictionary_size;
const int size = mb->stream_pos - offset;
memmove( mb->buffer, mb->buffer + offset, size );
mb->partial_data_pos += offset;
@ -110,7 +111,7 @@ bool Mb_init( struct Matchfinder_base * const mb, const int before_size,
size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 );
if( mb->dictionary_size > 1 << 26 ) /* 64 MiB */
size >>= 1;
mb->key4_mask = size - 1;
mb->key4_mask = size - 1; /* increases with dictionary size */
size += num_prev_positions23;
mb->num_prev_positions = size;
@ -171,15 +172,15 @@ void LZeb_full_flush( struct LZ_encoder_base * const eb, const State state )
{
int i;
const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask;
File_trailer trailer;
Lzip_trailer trailer;
Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 );
Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 );
LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len, pos_state );
Re_flush( &eb->renc );
Ft_set_data_crc( trailer, LZeb_crc( eb ) );
Ft_set_data_size( trailer, Mb_data_position( &eb->mb ) );
Ft_set_member_size( trailer, Re_member_position( &eb->renc ) + Ft_size );
for( i = 0; i < Ft_size; ++i )
Lt_set_data_crc( trailer, LZeb_crc( eb ) );
Lt_set_data_size( trailer, Mb_data_position( &eb->mb ) );
Lt_set_member_size( trailer, Re_member_position( &eb->renc ) + Lt_size );
for( i = 0; i < Lt_size; ++i )
Re_put_byte( &eb->renc, trailer[i] );
Re_flush_data( &eb->renc );
}

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -237,7 +237,7 @@ struct Range_encoder
unsigned ff_count;
int outfd; /* output file descriptor */
uint8_t cache;
File_header header;
Lzip_header header;
};
void Re_flush_data( struct Range_encoder * const renc );
@ -273,8 +273,8 @@ static inline void Re_reset( struct Range_encoder * const renc,
renc->range = 0xFFFFFFFFU;
renc->ff_count = 0;
renc->cache = 0;
Fh_set_dictionary_size( renc->header, dictionary_size );
for( i = 0; i < Fh_size; ++i )
Lh_set_dictionary_size( renc->header, dictionary_size );
for( i = 0; i < Lh_size; ++i )
Re_put_byte( renc, renc->header[i] );
}
@ -284,7 +284,7 @@ static inline bool Re_init( struct Range_encoder * const renc,
renc->buffer = (uint8_t *)malloc( re_buffer_size );
if( !renc->buffer ) return false;
renc->outfd = ofd;
Fh_set_magic( renc->header );
Lh_set_magic( renc->header );
Re_reset( renc, dictionary_size );
return true;
}

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -74,14 +74,14 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe,
const unsigned long long member_size )
{
const unsigned long long member_size_limit =
member_size - Ft_size - max_marker_size;
member_size - Lt_size - max_marker_size;
int rep = 0, i;
int reps[num_rep_distances];
State state = 0;
for( i = 0; i < num_rep_distances; ++i ) reps[i] = 0;
if( Mb_data_position( &fe->eb.mb ) != 0 ||
Re_member_position( &fe->eb.renc ) != Fh_size )
Re_member_position( &fe->eb.renc ) != Lh_size )
return false; /* can be called only once */
if( !Mb_data_finished( &fe->eb.mb ) ) /* encode first byte */

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -33,16 +33,16 @@ int FLZe_longest_match_len( struct FLZ_encoder * const fe, int * const distance
static inline void FLZe_update_and_move( struct FLZ_encoder * const fe, int n )
{
struct Matchfinder_base * const mb = &fe->eb.mb;
while( --n >= 0 )
{
if( Mb_available_bytes( &fe->eb.mb ) >= 4 )
if( Mb_available_bytes( mb ) >= 4 )
{
fe->key4 = ( ( fe->key4 << 4 ) ^ fe->eb.mb.buffer[fe->eb.mb.pos+3] ) &
fe->eb.mb.key4_mask;
fe->eb.mb.pos_array[fe->eb.mb.cyclic_pos] = fe->eb.mb.prev_positions[fe->key4];
fe->eb.mb.prev_positions[fe->key4] = fe->eb.mb.pos + 1;
fe->key4 = ( ( fe->key4 << 4 ) ^ mb->buffer[mb->pos+3] ) & mb->key4_mask;
mb->pos_array[mb->cyclic_pos] = mb->prev_positions[fe->key4];
mb->prev_positions[fe->key4] = mb->pos + 1;
}
Mb_move_pos( &fe->eb.mb );
Mb_move_pos( mb );
}
}

View file

@ -1,272 +0,0 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#define _FILE_OFFSET_BITS 64
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include "lzip.h"
#include "file_index.h"
static int seek_read( const int fd, uint8_t * const buf, const int size,
const long long pos )
{
if( lseek( fd, pos, SEEK_SET ) == pos )
return readblock( fd, buf, size );
return 0;
}
static bool add_error( struct File_index * const fi, const char * const msg )
{
const int len = strlen( msg );
void * tmp = resize_buffer( fi->error, fi->error_size + len + 1 );
if( !tmp ) return false;
fi->error = (char *)tmp;
strncpy( fi->error + fi->error_size, msg, len + 1 );
fi->error_size += len;
return true;
}
static bool push_back_member( struct File_index * const fi,
const long long dp, const long long ds,
const long long mp, const long long ms,
const unsigned dict_size )
{
struct Member * p;
void * tmp = resize_buffer( fi->member_vector,
( fi->members + 1 ) * sizeof fi->member_vector[0] );
if( !tmp )
{ add_error( fi, "Not enough memory." ); fi->retval = 1; return false; }
fi->member_vector = (struct Member *)tmp;
p = &(fi->member_vector[fi->members]);
init_member( p, dp, ds, mp, ms, dict_size );
++fi->members;
return true;
}
static void Fi_free_member_vector( struct File_index * const fi )
{
if( fi->member_vector )
{ free( fi->member_vector ); fi->member_vector = 0; }
fi->members = 0;
}
static void Fi_reverse_member_vector( struct File_index * const fi )
{
struct Member tmp;
long i;
for( i = 0; i < fi->members / 2; ++i )
{
tmp = fi->member_vector[i];
fi->member_vector[i] = fi->member_vector[fi->members-i-1];
fi->member_vector[fi->members-i-1] = tmp;
}
}
static void Fi_set_errno_error( struct File_index * const fi,
const char * const msg )
{
add_error( fi, msg ); add_error( fi, strerror( errno ) );
fi->retval = 1;
}
static void Fi_set_num_error( struct File_index * const fi,
const char * const msg, unsigned long long num )
{
char buf[80];
snprintf( buf, sizeof buf, "%s%llu", msg, num );
add_error( fi, buf );
fi->retval = 2;
}
/* If successful, push last member and set pos to member header. */
static bool Fi_skip_trailing_data( struct File_index * const fi,
const int fd, long long * const pos,
const bool ignore_trailing,
const bool loose_trailing )
{
enum { block_size = 16384,
buffer_size = block_size + Ft_size - 1 + Fh_size };
uint8_t buffer[buffer_size];
int bsize = *pos % block_size; /* total bytes in buffer */
int search_size, rd_size;
unsigned long long ipos;
int i;
if( bsize <= buffer_size - block_size ) bsize += block_size;
search_size = bsize; /* bytes to search for trailer */
rd_size = bsize; /* bytes to read from file */
ipos = *pos - rd_size; /* aligned to block_size */
if( *pos < min_member_size ) return false;
while( true )
{
const uint8_t max_msb = ( ipos + search_size ) >> 56;
if( seek_read( fd, buffer, rd_size, ipos ) != rd_size )
{ Fi_set_errno_error( fi, "Error seeking member trailer: " );
return false; }
for( i = search_size; i >= Ft_size; --i )
if( buffer[i-1] <= max_msb ) /* most significant byte of member_size */
{
File_header header;
File_trailer * trailer = (File_trailer *)( buffer + i - Ft_size );
const unsigned long long member_size = Ft_get_member_size( *trailer );
unsigned dictionary_size;
if( member_size == 0 )
{ while( i > Ft_size && buffer[i-9] == 0 ) --i; continue; }
if( member_size < min_member_size || member_size > ipos + i )
continue;
if( seek_read( fd, header, Fh_size,
ipos + i - member_size ) != Fh_size )
{ Fi_set_errno_error( fi, "Error reading member header: " );
return false; }
dictionary_size = Fh_get_dictionary_size( header );
if( !Fh_verify_magic( header ) || !Fh_verify_version( header ) ||
!isvalid_ds( dictionary_size ) ) continue;
if( Fh_verify_prefix( buffer + i, bsize - i ) )
{
add_error( fi, "Last member in input file is truncated or corrupt." );
fi->retval = 2; return false;
}
if( !loose_trailing && bsize - i >= Fh_size &&
Fh_verify_corrupt( buffer + i ) )
{ add_error( fi, corrupt_mm_msg ); fi->retval = 2; return false; }
if( !ignore_trailing )
{ add_error( fi, trailing_msg ); fi->retval = 2; return false; }
*pos = ipos + i - member_size;
return push_back_member( fi, 0, Ft_get_data_size( *trailer ), *pos,
member_size, dictionary_size );
}
if( ipos <= 0 )
{ Fi_set_num_error( fi, "Member size in trailer is corrupt at pos ",
*pos - 8 );
return false; }
bsize = buffer_size;
search_size = bsize - Fh_size;
rd_size = block_size;
ipos -= rd_size;
memcpy( buffer + rd_size, buffer, buffer_size - rd_size );
}
}
bool Fi_init( struct File_index * const fi, const int infd,
const bool ignore_trailing, const bool loose_trailing )
{
File_header header;
long long pos;
long i;
fi->member_vector = 0;
fi->error = 0;
fi->isize = lseek( infd, 0, SEEK_END );
fi->members = 0;
fi->error_size = 0;
fi->retval = 0;
if( fi->isize < 0 )
{ Fi_set_errno_error( fi, "Input file is not seekable: " ); return false; }
if( fi->isize < min_member_size )
{ add_error( fi, "Input file is too short." ); fi->retval = 2;
return false; }
if( fi->isize > INT64_MAX )
{ add_error( fi, "Input file is too long (2^63 bytes or more)." );
fi->retval = 2; return false; }
if( seek_read( infd, header, Fh_size, 0 ) != Fh_size )
{ Fi_set_errno_error( fi, "Error reading member header: " ); return false; }
if( !Fh_verify_magic( header ) )
{ add_error( fi, bad_magic_msg ); fi->retval = 2; return false; }
if( !Fh_verify_version( header ) )
{ add_error( fi, bad_version( Fh_version( header ) ) ); fi->retval = 2;
return false; }
if( !isvalid_ds( Fh_get_dictionary_size( header ) ) )
{ add_error( fi, bad_dict_msg ); fi->retval = 2; return false; }
pos = fi->isize; /* always points to a header or to EOF */
while( pos >= min_member_size )
{
File_trailer trailer;
unsigned long long member_size;
unsigned dictionary_size;
if( seek_read( infd, trailer, Ft_size, pos - Ft_size ) != Ft_size )
{ Fi_set_errno_error( fi, "Error reading member trailer: " ); break; }
member_size = Ft_get_member_size( trailer );
if( member_size < min_member_size || member_size > (unsigned long long)pos )
{
if( fi->members <= 0 )
{ if( Fi_skip_trailing_data( fi, infd, &pos, ignore_trailing,
loose_trailing ) ) continue; else return false; }
Fi_set_num_error( fi, "Member size in trailer is corrupt at pos ", pos - 8 );
break;
}
if( seek_read( infd, header, Fh_size, pos - member_size ) != Fh_size )
{ Fi_set_errno_error( fi, "Error reading member header: " ); break; }
dictionary_size = Fh_get_dictionary_size( header );
if( !Fh_verify_magic( header ) || !Fh_verify_version( header ) ||
!isvalid_ds( dictionary_size ) )
{
if( fi->members <= 0 )
{ if( Fi_skip_trailing_data( fi, infd, &pos, ignore_trailing,
loose_trailing ) ) continue; else return false; }
Fi_set_num_error( fi, "Bad header at pos ", pos - member_size );
break;
}
pos -= member_size;
if( !push_back_member( fi, 0, Ft_get_data_size( trailer ), pos,
member_size, dictionary_size ) )
return false;
}
if( pos != 0 || fi->members <= 0 )
{
Fi_free_member_vector( fi );
if( fi->retval == 0 )
{ add_error( fi, "Can't create file index." ); fi->retval = 2; }
return false;
}
Fi_reverse_member_vector( fi );
for( i = 0; i < fi->members - 1; ++i )
{
const long long end = block_end( fi->member_vector[i].dblock );
if( end < 0 || end > INT64_MAX )
{
Fi_free_member_vector( fi );
add_error( fi, "Data in input file is too long (2^63 bytes or more)." );
fi->retval = 2; return false;
}
fi->member_vector[i+1].dblock.pos = end;
}
return true;
}
void Fi_free( struct File_index * const fi )
{
Fi_free_member_vector( fi );
if( fi->error ) { free( fi->error ); fi->error = 0; }
fi->error_size = 0;
}

38
list.c
View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -25,7 +25,7 @@
#include <sys/stat.h>
#include "lzip.h"
#include "file_index.h"
#include "lzip_index.h"
static void list_line( const unsigned long long uncomp_size,
@ -53,7 +53,7 @@ int list_files( const char * const filenames[], const int num_filenames,
for( i = 0; i < num_filenames; ++i )
{
const char * input_filename;
struct File_index file_index;
struct Lzip_index lzip_index;
struct stat in_stats; /* not used */
int infd;
const bool from_stdin = ( strcmp( filenames[i], "-" ) == 0 );
@ -63,18 +63,18 @@ int list_files( const char * const filenames[], const int num_filenames,
open_instream( input_filename, &in_stats, true, true );
if( infd < 0 ) { if( retval < 1 ) retval = 1; continue; }
Fi_init( &file_index, infd, ignore_trailing, loose_trailing );
Li_init( &lzip_index, infd, ignore_trailing, loose_trailing );
close( infd );
if( file_index.retval != 0 )
if( lzip_index.retval != 0 )
{
show_file_error( input_filename, file_index.error, 0 );
if( retval < file_index.retval ) retval = file_index.retval;
Fi_free( &file_index ); continue;
show_file_error( input_filename, lzip_index.error, 0 );
if( retval < lzip_index.retval ) retval = lzip_index.retval;
Li_free( &lzip_index ); continue;
}
if( verbosity >= 0 )
{
const unsigned long long udata_size = Fi_udata_size( &file_index );
const unsigned long long cdata_size = Fi_cdata_size( &file_index );
const unsigned long long udata_size = Li_udata_size( &lzip_index );
const unsigned long long cdata_size = Li_cdata_size( &lzip_index );
total_comp += cdata_size; total_uncomp += udata_size; ++files;
if( first_post )
{
@ -87,23 +87,23 @@ int list_files( const char * const filenames[], const int num_filenames,
long long trailing_size;
unsigned dictionary_size = 0;
long i;
for( i = 0; i < file_index.members; ++i )
for( i = 0; i < lzip_index.members; ++i )
dictionary_size =
max( dictionary_size, Fi_dictionary_size( &file_index, i ) );
trailing_size = Fi_file_size( &file_index ) - cdata_size;
max( dictionary_size, Li_dictionary_size( &lzip_index, i ) );
trailing_size = Li_file_size( &lzip_index ) - cdata_size;
printf( "%s %5ld %6lld ", format_ds( dictionary_size ),
file_index.members, trailing_size );
lzip_index.members, trailing_size );
}
list_line( udata_size, cdata_size, input_filename );
if( verbosity >= 2 && file_index.members > 1 )
if( verbosity >= 2 && lzip_index.members > 1 )
{
long i;
fputs( " member data_pos data_size member_pos member_size\n", stdout );
for( i = 0; i < file_index.members; ++i )
for( i = 0; i < lzip_index.members; ++i )
{
const struct Block * db = Fi_dblock( &file_index, i );
const struct Block * mb = Fi_mblock( &file_index, i );
const struct Block * db = Li_dblock( &lzip_index, i );
const struct Block * mb = Li_mblock( &lzip_index, i );
printf( "%5ld %15llu %15llu %15llu %15llu\n",
i + 1, db->pos, db->size, mb->pos, mb->size );
}
@ -111,7 +111,7 @@ int list_files( const char * const filenames[], const int num_filenames,
}
fflush( stdout );
}
Fi_free( &file_index );
Li_free( &lzip_index );
}
if( verbosity >= 0 && files > 1 )
{

70
lzip.h
View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -36,6 +36,8 @@ static inline State St_set_char( const State st )
return next[st];
}
static inline State St_set_char_rep() { return 8; }
static inline State St_set_match( const State st )
{ return ( ( st < 7 ) ? 7 : 10 ); }
@ -119,7 +121,7 @@ static inline void Lm_init( struct Len_model * const lm )
/* defined in main.c */
extern int verbosity;
struct Pretty_print
struct Pretty_print /* requires global var 'int verbosity' */
{
const char * name;
char * padded_name;
@ -146,7 +148,7 @@ static inline void Pp_init( struct Pretty_print * const pp,
{
const char * const s = filenames[i];
const unsigned len = (strcmp( s, "-" ) == 0) ? stdin_name_len : strlen( s );
if( len > pp->longest_name ) pp->longest_name = len;
if( pp->longest_name < len ) pp->longest_name = len;
}
if( pp->longest_name == 0 ) pp->longest_name = stdin_name_len;
}
@ -220,43 +222,43 @@ static inline int real_bits( unsigned value )
}
static const uint8_t magic_string[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */
static const uint8_t lzip_magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */
typedef uint8_t File_header[6]; /* 0-3 magic bytes */
typedef uint8_t Lzip_header[6]; /* 0-3 magic bytes */
/* 4 version */
/* 5 coded_dict_size */
enum { Fh_size = 6 };
enum { Lh_size = 6 };
static inline void Fh_set_magic( File_header data )
{ memcpy( data, magic_string, 4 ); data[4] = 1; }
static inline void Lh_set_magic( Lzip_header data )
{ memcpy( data, lzip_magic, 4 ); data[4] = 1; }
static inline bool Fh_verify_magic( const File_header data )
{ return ( memcmp( data, magic_string, 4 ) == 0 ); }
static inline bool Lh_verify_magic( const Lzip_header data )
{ return ( memcmp( data, lzip_magic, 4 ) == 0 ); }
/* detect (truncated) header */
static inline bool Fh_verify_prefix( const File_header data, const int sz )
static inline bool Lh_verify_prefix( const Lzip_header data, const int sz )
{
int i; for( i = 0; i < sz && i < 4; ++i )
if( data[i] != magic_string[i] ) return false;
if( data[i] != lzip_magic[i] ) return false;
return ( sz > 0 );
}
/* detect corrupt header */
static inline bool Fh_verify_corrupt( const File_header data )
static inline bool Lh_verify_corrupt( const Lzip_header data )
{
int matches = 0;
int i; for( i = 0; i < 4; ++i )
if( data[i] == magic_string[i] ) ++matches;
if( data[i] == lzip_magic[i] ) ++matches;
return ( matches > 1 && matches < 4 );
}
static inline uint8_t Fh_version( const File_header data )
static inline uint8_t Lh_version( const Lzip_header data )
{ return data[4]; }
static inline bool Fh_verify_version( const File_header data )
static inline bool Lh_verify_version( const Lzip_header data )
{ return ( data[4] == 1 ); }
static inline unsigned Fh_get_dictionary_size( const File_header data )
static inline unsigned Lh_get_dictionary_size( const Lzip_header data )
{
unsigned sz = ( 1 << ( data[5] & 0x1F ) );
if( sz > min_dictionary_size )
@ -264,7 +266,7 @@ static inline unsigned Fh_get_dictionary_size( const File_header data )
return sz;
}
static inline bool Fh_set_dictionary_size( File_header data, const unsigned sz )
static inline bool Lh_set_dictionary_size( Lzip_header data, const unsigned sz )
{
if( !isvalid_ds( sz ) ) return false;
data[5] = real_bits( sz - 1 );
@ -281,43 +283,57 @@ static inline bool Fh_set_dictionary_size( File_header data, const unsigned sz )
}
typedef uint8_t File_trailer[20];
typedef uint8_t Lzip_trailer[20];
/* 0-3 CRC32 of the uncompressed data */
/* 4-11 size of the uncompressed data */
/* 12-19 member size including header and trailer */
enum { Lt_size = 20 };
enum { Ft_size = 20 };
static inline unsigned Ft_get_data_crc( const File_trailer data )
static inline unsigned Lt_get_data_crc( const Lzip_trailer data )
{
unsigned tmp = 0;
int i; for( i = 3; i >= 0; --i ) { tmp <<= 8; tmp += data[i]; }
return tmp;
}
static inline void Ft_set_data_crc( File_trailer data, unsigned crc )
static inline void Lt_set_data_crc( Lzip_trailer data, unsigned crc )
{ int i; for( i = 0; i <= 3; ++i ) { data[i] = (uint8_t)crc; crc >>= 8; } }
static inline unsigned long long Ft_get_data_size( const File_trailer data )
static inline unsigned long long Lt_get_data_size( const Lzip_trailer data )
{
unsigned long long tmp = 0;
int i; for( i = 11; i >= 4; --i ) { tmp <<= 8; tmp += data[i]; }
return tmp;
}
static inline void Ft_set_data_size( File_trailer data, unsigned long long sz )
static inline void Lt_set_data_size( Lzip_trailer data, unsigned long long sz )
{ int i; for( i = 4; i <= 11; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } }
static inline unsigned long long Ft_get_member_size( const File_trailer data )
static inline unsigned long long Lt_get_member_size( const Lzip_trailer data )
{
unsigned long long tmp = 0;
int i; for( i = 19; i >= 12; --i ) { tmp <<= 8; tmp += data[i]; }
return tmp;
}
static inline void Ft_set_member_size( File_trailer data, unsigned long long sz )
static inline void Lt_set_member_size( Lzip_trailer data, unsigned long long sz )
{ int i; for( i = 12; i <= 19; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } }
/* check internal consistency */
static inline bool Lt_verify_consistency( const Lzip_trailer data )
{
const unsigned crc = Lt_get_data_crc( data );
const unsigned long long dsize = Lt_get_data_size( data );
const unsigned long long msize = Lt_get_member_size( data );
const unsigned long long mlimit = ( 9 * dsize + 7 ) / 8 + min_member_size;
const unsigned long long dlimit = 7090 * ( msize - 26 ) - 1;
if( ( crc == 0 ) != ( dsize == 0 ) ) return false;
if( msize < min_member_size ) return false;
if( mlimit > dsize && msize > mlimit ) return false;
if( dlimit > msize && dsize > dlimit ) return false;
return true;
}
static const char * const bad_magic_msg = "Bad magic number (file not in lzip format).";
static const char * const bad_dict_msg = "Invalid dictionary size in member header.";

273
lzip_index.c Normal file
View file

@ -0,0 +1,273 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#define _FILE_OFFSET_BITS 64
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include "lzip.h"
#include "lzip_index.h"
static int seek_read( const int fd, uint8_t * const buf, const int size,
const long long pos )
{
if( lseek( fd, pos, SEEK_SET ) == pos )
return readblock( fd, buf, size );
return 0;
}
static bool add_error( struct Lzip_index * const li, const char * const msg )
{
const int len = strlen( msg );
void * tmp = resize_buffer( li->error, li->error_size + len + 1 );
if( !tmp ) return false;
li->error = (char *)tmp;
strncpy( li->error + li->error_size, msg, len + 1 );
li->error_size += len;
return true;
}
static bool push_back_member( struct Lzip_index * const li,
const long long dp, const long long ds,
const long long mp, const long long ms,
const unsigned dict_size )
{
struct Member * p;
void * tmp = resize_buffer( li->member_vector,
( li->members + 1 ) * sizeof li->member_vector[0] );
if( !tmp )
{ add_error( li, "Not enough memory." ); li->retval = 1; return false; }
li->member_vector = (struct Member *)tmp;
p = &(li->member_vector[li->members]);
init_member( p, dp, ds, mp, ms, dict_size );
++li->members;
return true;
}
static void Li_free_member_vector( struct Lzip_index * const li )
{
if( li->member_vector )
{ free( li->member_vector ); li->member_vector = 0; }
li->members = 0;
}
static void Li_reverse_member_vector( struct Lzip_index * const li )
{
struct Member tmp;
long i;
for( i = 0; i < li->members / 2; ++i )
{
tmp = li->member_vector[i];
li->member_vector[i] = li->member_vector[li->members-i-1];
li->member_vector[li->members-i-1] = tmp;
}
}
static void Li_set_errno_error( struct Lzip_index * const li,
const char * const msg )
{
add_error( li, msg ); add_error( li, strerror( errno ) );
li->retval = 1;
}
static void Li_set_num_error( struct Lzip_index * const li,
const char * const msg, unsigned long long num )
{
char buf[80];
snprintf( buf, sizeof buf, "%s%llu", msg, num );
add_error( li, buf );
li->retval = 2;
}
/* If successful, push last member and set pos to member header. */
static bool Li_skip_trailing_data( struct Lzip_index * const li,
const int fd, long long * const pos,
const bool ignore_trailing,
const bool loose_trailing )
{
enum { block_size = 16384,
buffer_size = block_size + Lt_size - 1 + Lh_size };
uint8_t buffer[buffer_size];
int bsize = *pos % block_size; /* total bytes in buffer */
int search_size, rd_size;
unsigned long long ipos;
int i;
if( *pos < min_member_size ) return false;
if( bsize <= buffer_size - block_size ) bsize += block_size;
search_size = bsize; /* bytes to search for trailer */
rd_size = bsize; /* bytes to read from file */
ipos = *pos - rd_size; /* aligned to block_size */
while( true )
{
const uint8_t max_msb = ( ipos + search_size ) >> 56;
if( seek_read( fd, buffer, rd_size, ipos ) != rd_size )
{ Li_set_errno_error( li, "Error seeking member trailer: " );
return false; }
for( i = search_size; i >= Lt_size; --i )
if( buffer[i-1] <= max_msb ) /* most significant byte of member_size */
{
Lzip_header header;
const Lzip_trailer * const trailer =
(const Lzip_trailer *)( buffer + i - Lt_size );
const unsigned long long member_size = Lt_get_member_size( *trailer );
unsigned dictionary_size;
if( member_size == 0 ) /* skip trailing zeros */
{ while( i > Lt_size && buffer[i-9] == 0 ) --i; continue; }
if( member_size > ipos + i || !Lt_verify_consistency( *trailer ) )
continue;
if( seek_read( fd, header, Lh_size,
ipos + i - member_size ) != Lh_size )
{ Li_set_errno_error( li, "Error reading member header: " );
return false; }
dictionary_size = Lh_get_dictionary_size( header );
if( !Lh_verify_magic( header ) || !Lh_verify_version( header ) ||
!isvalid_ds( dictionary_size ) ) continue;
if( Lh_verify_prefix( buffer + i, bsize - i ) )
{
add_error( li, "Last member in input file is truncated or corrupt." );
li->retval = 2; return false;
}
if( !loose_trailing && bsize - i >= Lh_size &&
Lh_verify_corrupt( buffer + i ) )
{ add_error( li, corrupt_mm_msg ); li->retval = 2; return false; }
if( !ignore_trailing )
{ add_error( li, trailing_msg ); li->retval = 2; return false; }
*pos = ipos + i - member_size;
return push_back_member( li, 0, Lt_get_data_size( *trailer ), *pos,
member_size, dictionary_size );
}
if( ipos <= 0 )
{ Li_set_num_error( li, "Bad trailer at pos ", *pos - Lt_size );
return false; }
bsize = buffer_size;
search_size = bsize - Lh_size;
rd_size = block_size;
ipos -= rd_size;
memcpy( buffer + rd_size, buffer, buffer_size - rd_size );
}
}
bool Li_init( struct Lzip_index * const li, const int infd,
const bool ignore_trailing, const bool loose_trailing )
{
Lzip_header header;
long long pos;
long i;
li->member_vector = 0;
li->error = 0;
li->insize = lseek( infd, 0, SEEK_END );
li->members = 0;
li->error_size = 0;
li->retval = 0;
if( li->insize < 0 )
{ Li_set_errno_error( li, "Input file is not seekable: " ); return false; }
if( li->insize < min_member_size )
{ add_error( li, "Input file is too short." ); li->retval = 2;
return false; }
if( li->insize > INT64_MAX )
{ add_error( li, "Input file is too long (2^63 bytes or more)." );
li->retval = 2; return false; }
if( seek_read( infd, header, Lh_size, 0 ) != Lh_size )
{ Li_set_errno_error( li, "Error reading member header: " ); return false; }
if( !Lh_verify_magic( header ) )
{ add_error( li, bad_magic_msg ); li->retval = 2; return false; }
if( !Lh_verify_version( header ) )
{ add_error( li, bad_version( Lh_version( header ) ) ); li->retval = 2;
return false; }
if( !isvalid_ds( Lh_get_dictionary_size( header ) ) )
{ add_error( li, bad_dict_msg ); li->retval = 2; return false; }
pos = li->insize; /* always points to a header or to EOF */
while( pos >= min_member_size )
{
Lzip_trailer trailer;
unsigned long long member_size;
unsigned dictionary_size;
if( seek_read( infd, trailer, Lt_size, pos - Lt_size ) != Lt_size )
{ Li_set_errno_error( li, "Error reading member trailer: " ); break; }
member_size = Lt_get_member_size( trailer );
if( member_size > (unsigned long long)pos || !Lt_verify_consistency( trailer ) )
{
if( li->members <= 0 )
{ if( Li_skip_trailing_data( li, infd, &pos, ignore_trailing,
loose_trailing ) ) continue; else return false; }
Li_set_num_error( li, "Bad trailer at pos ", pos - Lt_size );
break;
}
if( seek_read( infd, header, Lh_size, pos - member_size ) != Lh_size )
{ Li_set_errno_error( li, "Error reading member header: " ); break; }
dictionary_size = Lh_get_dictionary_size( header );
if( !Lh_verify_magic( header ) || !Lh_verify_version( header ) ||
!isvalid_ds( dictionary_size ) )
{
if( li->members <= 0 )
{ if( Li_skip_trailing_data( li, infd, &pos, ignore_trailing,
loose_trailing ) ) continue; else return false; }
Li_set_num_error( li, "Bad header at pos ", pos - member_size );
break;
}
pos -= member_size;
if( !push_back_member( li, 0, Lt_get_data_size( trailer ), pos,
member_size, dictionary_size ) )
return false;
}
if( pos != 0 || li->members <= 0 )
{
Li_free_member_vector( li );
if( li->retval == 0 )
{ add_error( li, "Can't create file index." ); li->retval = 2; }
return false;
}
Li_reverse_member_vector( li );
for( i = 0; ; ++i )
{
const long long end = block_end( li->member_vector[i].dblock );
if( end < 0 || end > INT64_MAX )
{
Li_free_member_vector( li );
add_error( li, "Data in input file is too long (2^63 bytes or more)." );
li->retval = 2; return false;
}
if( i + 1 >= li->members ) break;
li->member_vector[i+1].dblock.pos = end;
}
return true;
}
void Li_free( struct Lzip_index * const li )
{
Li_free_member_vector( li );
if( li->error ) { free( li->error ); li->error = 0; }
li->error_size = 0;
}

View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -46,45 +46,45 @@ static inline void init_member( struct Member * const m,
{ init_block( &m->dblock, dp, ds ); init_block( &m->mblock, mp, ms );
m->dictionary_size = dict_size; }
struct File_index
struct Lzip_index
{
struct Member * member_vector;
char * error;
long long isize;
long long insize;
long members;
int error_size;
int retval;
};
bool Fi_init( struct File_index * const fi, const int infd,
bool Li_init( struct Lzip_index * const li, const int infd,
const bool ignore_trailing, const bool loose_trailing );
void Fi_free( struct File_index * const fi );
void Li_free( struct Lzip_index * const li );
static inline long long Fi_udata_size( const struct File_index * const fi )
static inline long long Li_udata_size( const struct Lzip_index * const li )
{
if( fi->members <= 0 ) return 0;
return block_end( fi->member_vector[fi->members-1].dblock );
if( li->members <= 0 ) return 0;
return block_end( li->member_vector[li->members-1].dblock );
}
static inline long long Fi_cdata_size( const struct File_index * const fi )
static inline long long Li_cdata_size( const struct Lzip_index * const li )
{
if( fi->members <= 0 ) return 0;
return block_end( fi->member_vector[fi->members-1].mblock );
if( li->members <= 0 ) return 0;
return block_end( li->member_vector[li->members-1].mblock );
}
/* total size including trailing data (if any) */
static inline long long Fi_file_size( const struct File_index * const fi )
{ if( fi->isize >= 0 ) return fi->isize; else return 0; }
static inline long long Li_file_size( const struct Lzip_index * const li )
{ if( li->insize >= 0 ) return li->insize; else return 0; }
static inline const struct Block * Fi_dblock( const struct File_index * const fi,
static inline const struct Block * Li_dblock( const struct Lzip_index * const li,
const long i )
{ return &fi->member_vector[i].dblock; }
{ return &li->member_vector[i].dblock; }
static inline const struct Block * Fi_mblock( const struct File_index * const fi,
static inline const struct Block * Li_mblock( const struct Lzip_index * const li,
const long i )
{ return &fi->member_vector[i].mblock; }
{ return &li->member_vector[i].mblock; }
static inline unsigned Fi_dictionary_size( const struct File_index * const fi,
static inline unsigned Li_dictionary_size( const struct Lzip_index * const li,
const long i )
{ return fi->member_vector[i].dictionary_size; }
{ return li->member_vector[i].dictionary_size; }

123
main.c
View file

@ -1,5 +1,5 @@
/* Clzip - LZMA lossless data compressor
Copyright (C) 2010-2018 Antonio Diaz Diaz.
Copyright (C) 2010-2019 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -36,20 +36,25 @@
#include <unistd.h>
#include <utime.h>
#include <sys/stat.h>
#if defined(__MSVCRT__)
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
#include <io.h>
#if defined(__MSVCRT__)
#define fchmod(x,y) 0
#define fchown(x,y,z) 0
#define strtoull strtoul
#define SIGHUP SIGTERM
#define S_ISSOCK(x) 0
#ifndef S_IRGRP
#define S_IRGRP 0
#define S_IWGRP 0
#define S_IROTH 0
#define S_IWOTH 0
#endif
#if defined(__OS2__)
#include <io.h>
#endif
#if defined(__DJGPP__)
#define S_ISSOCK(x) 0
#define S_ISVTX 0
#endif
#endif
#include "carg_parser.h"
@ -69,9 +74,8 @@
int verbosity = 0;
const char * const Program_name = "Clzip";
const char * const program_name = "clzip";
const char * const program_year = "2018";
const char * const program_year = "2019";
const char * invocation_name = 0;
const struct { const char * from; const char * to; } known_extensions[] = {
@ -87,6 +91,8 @@ struct Lzma_options
enum Mode { m_compress, m_decompress, m_list, m_test };
/* Variables used in signal handler context.
They are not declared volatile because the handler never returns. */
char * output_filename = 0;
int outfd = -1;
bool delete_output_on_interrupt = false;
@ -94,8 +100,18 @@ bool delete_output_on_interrupt = false;
static void show_help( void )
{
printf( "%s - LZMA lossless data compressor.\n", Program_name );
printf( "\nUsage: %s [options] [files]\n", invocation_name );
printf( "Clzip is a C language version of lzip, fully compatible with lzip 1.4 or\n"
"newer. As clzip is written in C, it may be easier to integrate in\n"
"applications like package managers, embedded devices, or systems lacking\n"
"a C++ compiler.\n"
"\nLzip is a lossless data compressor with a user interface similar to the\n"
"one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0)\n"
"or compress most files more than bzip2 (lzip -9). Decompression speed is\n"
"intermediate between gzip and bzip2. Lzip is better than gzip and bzip2\n"
"from a data recovery perspective. Lzip has been designed, written and\n"
"tested with great care to replace gzip and bzip2 as the standard\n"
"general-purpose compressed format for unix-like systems.\n"
"\nUsage: %s [options] [files]\n", invocation_name );
printf( "\nOptions:\n"
" -h, --help display this help and exit\n"
" -V, --version output version information and exit\n"
@ -111,7 +127,7 @@ static void show_help( void )
" -o, --output=<file> if reading standard input, write to <file>\n"
" -q, --quiet suppress all messages\n"
" -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n"
" -S, --volume-size=<bytes> set volume size limit in bytes, implies -k\n"
" -S, --volume-size=<bytes> set volume size limit in bytes\n"
" -t, --test test compressed file integrity\n"
" -v, --verbose be verbose (a 2nd -v gives more)\n"
" -0 .. -9 set compression level [default 6]\n"
@ -227,7 +243,7 @@ static unsigned long long getnum( const char * const ptr,
if( !errno && tail[0] )
{
const unsigned factor = ( tail[1] == 'i' ) ? 1024 : 1000;
int exponent = 0; /* 0 = bad multiplier */
int exponent = 0; /* 0 = bad multiplier */
int i;
switch( tail[0] )
{
@ -268,7 +284,7 @@ static int get_dict_size( const char * const arg )
const long bits = strtol( arg, &tail, 0 );
if( bits >= min_dictionary_bits &&
bits <= max_dictionary_bits && *tail == 0 )
return ( 1 << bits );
return 1 << bits;
return getnum( arg, min_dictionary_size, max_dictionary_size );
}
@ -423,8 +439,17 @@ static bool check_tty( const char * const input_filename, const int infd,
}
static void set_signals( void (*action)(int) )
{
signal( SIGHUP, action );
signal( SIGINT, action );
signal( SIGTERM, action );
}
void cleanup_and_fail( const int retval )
{
set_signals( SIG_IGN ); /* ignore signals */
if( delete_output_on_interrupt )
{
delete_output_on_interrupt = false;
@ -439,6 +464,14 @@ void cleanup_and_fail( const int retval )
}
void signal_handler( int sig )
{
if( sig ) {} /* keep compiler happy */
show_error( "Control-C or similar caught, quitting.", 0, false );
cleanup_and_fail( 1 );
}
/* Set permissions, owner and times. */
static void close_and_set_permissions( const struct stat * const in_statsp )
{
@ -518,13 +551,13 @@ static int compress( const unsigned long long cfile_size,
}
else
{
File_header header;
if( Fh_set_dictionary_size( header, encoder_options->dictionary_size ) &&
Lzip_header header;
if( Lh_set_dictionary_size( header, encoder_options->dictionary_size ) &&
encoder_options->match_len_limit >= min_match_len_limit &&
encoder_options->match_len_limit <= max_match_len )
encoder.e = (struct LZ_encoder *)malloc( sizeof *encoder.e );
else internal_error( "invalid argument to encoder." );
if( !encoder.e || !LZe_init( encoder.e, Fh_get_dictionary_size( header ),
if( !encoder.e || !LZe_init( encoder.e, Lh_get_dictionary_size( header ),
encoder_options->match_len_limit, infd, outfd ) )
error = true;
else encoder.eb = &encoder.e->eb;
@ -637,16 +670,16 @@ static int decompress( const unsigned long long cfile_size, const int infd,
{
int result, size;
unsigned dictionary_size;
File_header header;
Lzip_header header;
struct LZ_decoder decoder;
Rd_reset_member_position( &rdec );
size = Rd_read_data( &rdec, header, Fh_size );
size = Rd_read_data( &rdec, header, Lh_size );
if( Rd_finished( &rdec ) ) /* End Of File */
{
if( first_member )
{ show_file_error( pp->name, "File ends unexpectedly at member header.", 0 );
retval = 2; }
else if( Fh_verify_prefix( header, size ) )
else if( Lh_verify_prefix( header, size ) )
{ Pp_show_msg( pp, "Truncated header in multimember file." );
show_trailing_data( header, size, pp, true, -1 );
retval = 2; }
@ -655,11 +688,11 @@ static int decompress( const unsigned long long cfile_size, const int infd,
retval = 2;
break;
}
if( !Fh_verify_magic( header ) )
if( !Lh_verify_magic( header ) )
{
if( first_member )
{ show_file_error( pp->name, bad_magic_msg, 0 ); retval = 2; }
else if( !loose_trailing && Fh_verify_corrupt( header ) )
else if( !loose_trailing && Lh_verify_corrupt( header ) )
{ Pp_show_msg( pp, corrupt_mm_msg );
show_trailing_data( header, size, pp, false, -1 );
retval = 2; }
@ -667,10 +700,10 @@ static int decompress( const unsigned long long cfile_size, const int infd,
retval = 2;
break;
}
if( !Fh_verify_version( header ) )
{ Pp_show_msg( pp, bad_version( Fh_version( header ) ) );
if( !Lh_verify_version( header ) )
{ Pp_show_msg( pp, bad_version( Lh_version( header ) ) );
retval = 2; break; }
dictionary_size = Fh_get_dictionary_size( header );
dictionary_size = Lh_get_dictionary_size( header );
if( !isvalid_ds( dictionary_size ) )
{ Pp_show_msg( pp, bad_dict_msg ); retval = 2; break; }
@ -689,7 +722,8 @@ static int decompress( const unsigned long long cfile_size, const int infd,
{
Pp_show_msg( pp, 0 );
fprintf( stderr, "%s at pos %llu\n", ( result == 2 ) ?
"File ends unexpectedly" : "Decoder error", partial_file_pos );
"File ends unexpectedly" : "Decoder error",
partial_file_pos );
}
retval = 2; break;
}
@ -703,31 +737,13 @@ static int decompress( const unsigned long long cfile_size, const int infd,
}
void signal_handler( int sig )
{
if( sig ) {} /* keep compiler happy */
show_error( "Control-C or similar caught, quitting.", 0, false );
cleanup_and_fail( 1 );
}
static void set_signals( void )
{
signal( SIGHUP, signal_handler );
signal( SIGINT, signal_handler );
signal( SIGTERM, signal_handler );
}
void show_error( const char * const msg, const int errcode, const bool help )
{
if( verbosity < 0 ) return;
if( msg && msg[0] )
{
fprintf( stderr, "%s: %s", program_name, msg );
if( errcode > 0 ) fprintf( stderr, ": %s", strerror( errcode ) );
fputc( '\n', stderr );
}
fprintf( stderr, "%s: %s%s%s\n", program_name, msg,
( errcode > 0 ) ? ": " : "",
( errcode > 0 ) ? strerror( errcode ) : "" );
if( help )
fprintf( stderr, "Try '%s --help' for more information.\n",
invocation_name );
@ -737,10 +753,10 @@ void show_error( const char * const msg, const int errcode, const bool help )
void show_file_error( const char * const filename, const char * const msg,
const int errcode )
{
if( verbosity < 0 ) return;
fprintf( stderr, "%s: %s: %s", program_name, filename, msg );
if( errcode > 0 ) fprintf( stderr, ": %s", strerror( errcode ) );
fputc( '\n', stderr );
if( verbosity >= 0 )
fprintf( stderr, "%s: %s: %s%s%s\n", program_name, filename, msg,
( errcode > 0 ) ? ": " : "",
( errcode > 0 ) ? strerror( errcode ) : "" );
}
@ -933,7 +949,7 @@ int main( const int argc, const char * const argv[] )
}
} /* end process options */
#if defined(__MSVCRT__) || defined(__OS2__)
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
setmode( STDIN_FILENO, O_BINARY );
setmode( STDOUT_FILENO, O_BINARY );
#endif
@ -961,7 +977,7 @@ int main( const int argc, const char * const argv[] )
if( !to_stdout && program_mode != m_test &&
( filenames_given || default_output_filename[0] ) )
set_signals();
set_signals( signal_handler );
Pp_init( &pp, filenames, num_filenames );
@ -1044,6 +1060,12 @@ int main( const int argc, const char * const argv[] )
else
tmp = decompress( cfile_size, infd, &pp, ignore_trailing,
loose_trailing, program_mode == m_test );
if( close( infd ) != 0 )
{
show_error( input_filename[0] ? "Error closing input file" :
"Error closing stdin", errno, false );
if( tmp < 1 ) tmp = 1;
}
if( tmp > retval ) retval = tmp;
if( tmp )
{ if( program_mode != m_test ) cleanup_and_fail( retval );
@ -1053,7 +1075,6 @@ int main( const int argc, const char * const argv[] )
close_and_set_permissions( in_statsp );
if( input_filename[0] )
{
close( infd );
if( !keep_input_files && !to_stdout && program_mode != m_test &&
( program_mode != m_compress || volume_size == 0 ) )
remove( input_filename );

View file

@ -1,6 +1,6 @@
#! /bin/sh
# check script for Clzip - LZMA lossless data compressor
# Copyright (C) 2010-2018 Antonio Diaz Diaz.
# Copyright (C) 2010-2019 Antonio Diaz Diaz.
#
# This script is free software: you have unlimited permission
# to copy, distribute and modify it.
@ -36,12 +36,15 @@ test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; }
printf "testing clzip-%s..." "$2"
"${LZIP}" -fkqm4 in
{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO
[ $? = 1 ] || test_failed $LINENO
[ ! -e in.lz ] || test_failed $LINENO
"${LZIP}" -fkqm274 in
{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO
[ $? = 1 ] || test_failed $LINENO
[ ! -e in.lz ] || test_failed $LINENO
for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do
"${LZIP}" -fkqs $i in
{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO $i
[ $? = 1 ] || test_failed $LINENO $i
[ ! -e in.lz ] || test_failed $LINENO $i
done
"${LZIP}" -lq in
[ $? = 2 ] || test_failed $LINENO
@ -91,31 +94,34 @@ printf "\ntesting decompression..."
"${LZIP}" -cd "${in_lz}" > copy || test_failed $LINENO
cmp in copy || test_failed $LINENO
rm -f copy
rm -f copy || framework_failure
cat "${in_lz}" > copy.lz || framework_failure
"${LZIP}" -dk copy.lz || test_failed $LINENO
cmp in copy || test_failed $LINENO
printf "to be overwritten" > copy || framework_failure
"${LZIP}" -d copy.lz 2> /dev/null
[ $? = 1 ] || test_failed $LINENO
"${LZIP}" -df copy.lz
{ [ $? = 0 ] && [ ! -e copy.lz ] && cmp in copy ; } || test_failed $LINENO
"${LZIP}" -df copy.lz || test_failed $LINENO
[ ! -e copy.lz ] || test_failed $LINENO
cmp in copy || test_failed $LINENO
rm -f copy
rm -f copy || framework_failure
cat "${in_lz}" > copy.lz || framework_failure
"${LZIP}" -d -S100k copy.lz
{ [ $? = 0 ] && [ ! -e copy.lz ] && cmp in copy ; } || test_failed $LINENO
"${LZIP}" -d -S100k copy.lz || test_failed $LINENO # ignore -S
[ ! -e copy.lz ] || test_failed $LINENO
cmp in copy || test_failed $LINENO
printf "to be overwritten" > copy || framework_failure
"${LZIP}" -df -o copy < "${in_lz}" || test_failed $LINENO
cmp in copy || test_failed $LINENO
rm -f copy
rm -f copy || framework_failure
"${LZIP}" < in > anyothername || test_failed $LINENO
"${LZIP}" -dv --output copy - anyothername - < "${in_lz}" 2> /dev/null
{ [ $? = 0 ] && cmp in copy && cmp in anyothername.out ; } ||
"${LZIP}" -dv --output copy - anyothername - < "${in_lz}" 2> /dev/null ||
test_failed $LINENO
rm -f copy anyothername.out
cmp in copy || test_failed $LINENO
cmp in anyothername.out || test_failed $LINENO
rm -f copy anyothername.out || framework_failure
"${LZIP}" -lq in "${in_lz}"
[ $? = 2 ] || test_failed $LINENO
@ -126,10 +132,12 @@ rm -f copy anyothername.out
"${LZIP}" -tq nx_file.lz "${in_lz}"
[ $? = 1 ] || test_failed $LINENO
"${LZIP}" -cdq in "${in_lz}" > copy
{ [ $? = 2 ] && cat copy in | cmp in - ; } || test_failed $LINENO
[ $? = 2 ] || test_failed $LINENO
cat copy in | cmp in - || test_failed $LINENO
"${LZIP}" -cdq nx_file.lz "${in_lz}" > copy
{ [ $? = 1 ] && cmp in copy ; } || test_failed $LINENO
rm -f copy
[ $? = 1 ] || test_failed $LINENO
cmp in copy || test_failed $LINENO
rm -f copy || framework_failure
cat "${in_lz}" > copy.lz || framework_failure
for i in 1 2 3 4 5 6 7 ; do
printf "g" >> copy.lz || framework_failure
@ -139,11 +147,15 @@ for i in 1 2 3 4 5 6 7 ; do
[ $? = 2 ] || test_failed $LINENO $i
done
"${LZIP}" -dq in copy.lz
{ [ $? = 2 ] && [ -e copy.lz ] && [ ! -e copy ] && [ ! -e in.out ] ; } ||
test_failed $LINENO
[ $? = 2 ] || test_failed $LINENO
[ -e copy.lz ] || test_failed $LINENO
[ ! -e copy ] || test_failed $LINENO
[ ! -e in.out ] || test_failed $LINENO
"${LZIP}" -dq nx_file.lz copy.lz
{ [ $? = 1 ] && [ ! -e copy.lz ] && [ ! -e nx_file ] && cmp in copy ; } ||
test_failed $LINENO
[ $? = 1 ] || test_failed $LINENO
[ ! -e copy.lz ] || test_failed $LINENO
[ ! -e nx_file ] || test_failed $LINENO
cmp in copy || test_failed $LINENO
cat in in > in2 || framework_failure
cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure
@ -160,7 +172,7 @@ cmp in2 copy2 || test_failed $LINENO
printf "\ngarbage" >> copy2.lz || framework_failure
"${LZIP}" -tvvvv copy2.lz 2> /dev/null || test_failed $LINENO
rm -f copy2
rm -f copy2 || framework_failure
"${LZIP}" -alq copy2.lz
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -atq copy2.lz
@ -168,12 +180,15 @@ rm -f copy2
"${LZIP}" -atq < copy2.lz
[ $? = 2 ] || test_failed $LINENO
"${LZIP}" -adkq copy2.lz
{ [ $? = 2 ] && [ ! -e copy2 ] ; } || test_failed $LINENO
[ $? = 2 ] || test_failed $LINENO
[ ! -e copy2 ] || test_failed $LINENO
"${LZIP}" -adkq -o copy2 < copy2.lz
{ [ $? = 2 ] && [ ! -e copy2 ] ; } || test_failed $LINENO
[ $? = 2 ] || test_failed $LINENO
[ ! -e copy2 ] || test_failed $LINENO
printf "to be overwritten" > copy2 || framework_failure
"${LZIP}" -df copy2.lz || test_failed $LINENO
cmp in2 copy2 || test_failed $LINENO
rm -f in2 copy2 || framework_failure
printf "\ntesting compression..."
@ -209,73 +224,94 @@ for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do
"${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i
cmp in copy || test_failed $LINENO $i
done
rm -f out.lz || framework_failure
cat in in in in in in in in > in8 || framework_failure
"${LZIP}" -1s12 -S100k in8 || test_failed $LINENO
"${LZIP}" -t in800001.lz in800002.lz || test_failed $LINENO
"${LZIP}" -cd in800001.lz in800002.lz | cmp in8 - || test_failed $LINENO
rm -f in800001.lz in800002.lz
rm -f in800001.lz in800002.lz || framework_failure
"${LZIP}" -1s12 -S100k -o out.lz < in8 || test_failed $LINENO
"${LZIP}" -t out.lz00001.lz out.lz00002.lz || test_failed $LINENO
"${LZIP}" -cd out.lz00001.lz out.lz00002.lz | cmp in8 - || test_failed $LINENO
rm -f out.lz00001.lz out.lz00002.lz
rm -f out.lz00001.lz out.lz00002.lz || framework_failure
"${LZIP}" -1ks4Ki -b100000 in8 || test_failed $LINENO
"${LZIP}" -t in8.lz || test_failed $LINENO
"${LZIP}" -cd in8.lz | cmp in8 - || test_failed $LINENO
rm -f in8
rm -f in8 || framework_failure
"${LZIP}" -0 -S100k -o out < in8.lz || test_failed $LINENO
"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO
"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO
rm -f out00001.lz
rm -f out00001.lz || framework_failure
"${LZIP}" -1 -S100k -o out < in8.lz || test_failed $LINENO
"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO
"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO
rm -f out00001.lz out00002.lz
rm -f out00001.lz out00002.lz || framework_failure
"${LZIP}" -0 -F -S100k in8.lz || test_failed $LINENO
"${LZIP}" -t in8.lz00001.lz in8.lz00002.lz || test_failed $LINENO
"${LZIP}" -cd in8.lz00001.lz in8.lz00002.lz | cmp in8.lz - || test_failed $LINENO
rm -f in8.lz00001.lz in8.lz00002.lz
rm -f in8.lz00001.lz in8.lz00002.lz || framework_failure
"${LZIP}" -0kF -b100k in8.lz || test_failed $LINENO
"${LZIP}" -t in8.lz.lz || test_failed $LINENO
"${LZIP}" -cd in8.lz.lz | cmp in8.lz - || test_failed $LINENO
rm -f in8.lz in8.lz.lz
rm -f in8.lz in8.lz.lz || framework_failure
printf "\ntesting bad input..."
headers='LZIp LZiP LZip LzIP LzIp LziP lZIP lZIp lZiP lzIP'
body='\001\014\000\203\377\373\377\377\300\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\000\000\000\000'
cat "${in_lz}" > in0.lz
printf "LZIP${body}" >> in0.lz
if "${LZIP}" -tq in0.lz ; then
cat "${in_lz}" > int.lz
printf "LZIP${body}" >> int.lz
if "${LZIP}" -tq int.lz ; then
for header in ${headers} ; do
printf "${header}${body}" > in0.lz # first member
"${LZIP}" -lq in0.lz
printf "${header}${body}" > int.lz # first member
"${LZIP}" -lq int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq in0.lz
"${LZIP}" -tq int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -lq --loose-trailing in0.lz
"${LZIP}" -tq < int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq --loose-trailing in0.lz
"${LZIP}" -cdq int.lz > /dev/null
[ $? = 2 ] || test_failed $LINENO ${header}
cat "${in_lz}" > in0.lz
printf "${header}${body}" >> in0.lz # trailing data
"${LZIP}" -lq in0.lz
"${LZIP}" -lq --loose-trailing int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq in0.lz
"${LZIP}" -tq --loose-trailing int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -lq --loose-trailing in0.lz
[ $? = 0 ] || test_failed $LINENO ${header}
"${LZIP}" -t --loose-trailing in0.lz
[ $? = 0 ] || test_failed $LINENO ${header}
"${LZIP}" -lq --loose-trailing --trailing-error in0.lz
"${LZIP}" -tq --loose-trailing < int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq --loose-trailing --trailing-error in0.lz
"${LZIP}" -cdq --loose-trailing int.lz > /dev/null
[ $? = 2 ] || test_failed $LINENO ${header}
cat "${in_lz}" > int.lz
printf "${header}${body}" >> int.lz # trailing data
"${LZIP}" -lq int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq < int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -cdq int.lz > /dev/null
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -lq --loose-trailing int.lz ||
test_failed $LINENO ${header}
"${LZIP}" -t --loose-trailing int.lz ||
test_failed $LINENO ${header}
"${LZIP}" -t --loose-trailing < int.lz ||
test_failed $LINENO ${header}
"${LZIP}" -cd --loose-trailing int.lz > /dev/null ||
test_failed $LINENO ${header}
"${LZIP}" -lq --loose-trailing --trailing-error int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq --loose-trailing --trailing-error int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -tq --loose-trailing --trailing-error < int.lz
[ $? = 2 ] || test_failed $LINENO ${header}
"${LZIP}" -cdq --loose-trailing --trailing-error int.lz > /dev/null
[ $? = 2 ] || test_failed $LINENO ${header}
done
else
printf "\nwarning: skipping header test: 'printf' does not work on your system."
fi
rm -f in0.lz
rm -f int.lz || framework_failure
cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure
if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null &&
@ -296,7 +332,7 @@ if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null &&
else
printf "\nwarning: skipping truncation test: 'dd' does not work on your system."
fi
rm -f in3.lz trunc.lz
rm -f in2.lz in3.lz trunc.lz out || framework_failure
cat "${in_lz}" > ingin.lz || framework_failure
printf "g" >> ingin.lz || framework_failure
@ -309,7 +345,7 @@ cmp in copy || test_failed $LINENO
"${LZIP}" -t < ingin.lz || test_failed $LINENO
"${LZIP}" -d < ingin.lz > copy || test_failed $LINENO
cmp in copy || test_failed $LINENO
rm -f ingin.lz
rm -f copy ingin.lz || framework_failure
echo
if [ ${fail} = 0 ] ; then