Merging upstream version 1.11.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
c1d97756f3
commit
d865a97d34
26 changed files with 1012 additions and 896 deletions
22
ChangeLog
22
ChangeLog
|
@ -1,7 +1,19 @@
|
|||
2019-01-03 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.11 released.
|
||||
* File_* renamed to Lzip_*.
|
||||
* lzip.h (Lzip_trailer): New function 'Lt_verify_consistency'.
|
||||
* lzip_index.c: Detect some kinds of corrupt trailers.
|
||||
* main.c (main): Check return value of close( infd ).
|
||||
* main.c: Compile on DOS with DJGPP.
|
||||
* clzip.texi: Improved descriptions of '-0..-9', '-m' and '-s'.
|
||||
* configure: Accept appending to CFLAGS, 'CFLAGS+=OPTIONS'.
|
||||
* INSTALL: Document use of CFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
|
||||
|
||||
2018-02-06 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.10 released.
|
||||
* main.c: Added new option '--loose-trailing'.
|
||||
* Added new option '--loose-trailing'.
|
||||
* Improved corrupt header detection to HD=3.
|
||||
* main.c: Show corrupt or truncated header in multimember file.
|
||||
* main.c (main): Option '-S, --volume-size' now keeps input files.
|
||||
|
@ -25,14 +37,14 @@
|
|||
* Decompression time has been reduced by 7%.
|
||||
* main.c: Continue testing if any input file is a terminal.
|
||||
* main.c: Show trailing data in both hexadecimal and ASCII.
|
||||
* file_index.c: Improve detection of bad dict and trailing data.
|
||||
* lzip_index.c: Improve detection of bad dict and trailing data.
|
||||
* lzip.h: Unified messages for bad magic, trailing data, etc.
|
||||
* clzip.texi: Added missing chapters from lzip.texi.
|
||||
|
||||
2016-05-13 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 1.8 released.
|
||||
* main.c: Added new option '-a, --trailing-error'.
|
||||
* Added new option '-a, --trailing-error'.
|
||||
* main.c (decompress): Print up to 6 bytes of trailing data
|
||||
when '-vvvv' is specified.
|
||||
* decoder.c (LZd_verify_trailer): Removed test of final code.
|
||||
|
@ -92,7 +104,7 @@
|
|||
2011-05-18 Antonio Diaz Diaz <ant_diaz@teleline.es>
|
||||
|
||||
* Version 1.2 released.
|
||||
* main.c: Added new option '-F, --recompress'.
|
||||
* Added new option '-F, --recompress'.
|
||||
* main.c (decompress): Print only one status line for each
|
||||
multimember file when only one '-v' is specified.
|
||||
* encoder.h (Lee_update_prices): Update high length symbol prices
|
||||
|
@ -125,7 +137,7 @@
|
|||
* Translated to C from the C++ source of lzip 1.10.
|
||||
|
||||
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This file is a collection of facts, and thus it is not copyrightable,
|
||||
but just in case, you have unlimited permission to copy, distribute and
|
||||
|
|
14
INSTALL
14
INSTALL
|
@ -1,10 +1,14 @@
|
|||
Requirements
|
||||
------------
|
||||
You will need a C compiler.
|
||||
I use gcc 5.3.0 and 4.1.2, but the code should compile with any
|
||||
standards compliant compiler.
|
||||
I use gcc 5.3.0 and 4.1.2, but the code should compile with any standards
|
||||
compliant compiler.
|
||||
Gcc is available at http://gcc.gnu.org.
|
||||
|
||||
The operating system must allow signal handlers read access to objects with
|
||||
static storage duration so that the cleanup handler for Control-C can delete
|
||||
the partial output file.
|
||||
|
||||
|
||||
Procedure
|
||||
---------
|
||||
|
@ -23,6 +27,10 @@ the main archive.
|
|||
cd clzip[version]
|
||||
./configure
|
||||
|
||||
If you are compiling on MinGW, use:
|
||||
|
||||
./configure CFLAGS+='-D __USE_MINGW_ANSI_STDIO'
|
||||
|
||||
3. Run make.
|
||||
|
||||
make
|
||||
|
@ -62,7 +70,7 @@ After running 'configure', you can run 'make' and 'make install' as
|
|||
explained above.
|
||||
|
||||
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This file is free documentation: you have unlimited permission to copy,
|
||||
distribute and modify it.
|
||||
|
|
|
@ -7,7 +7,7 @@ INSTALL_DIR = $(INSTALL) -d -m 755
|
|||
SHELL = /bin/sh
|
||||
CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1
|
||||
|
||||
objs = carg_parser.o file_index.o list.o encoder_base.o encoder.o \
|
||||
objs = carg_parser.o lzip_index.o list.o encoder_base.o encoder.o \
|
||||
fast_encoder.o decoder.o main.o
|
||||
|
||||
|
||||
|
@ -35,8 +35,8 @@ decoder.o : lzip.h decoder.h
|
|||
encoder_base.o : lzip.h encoder_base.h
|
||||
encoder.o : lzip.h encoder_base.h encoder.h
|
||||
fast_encoder.o : lzip.h encoder_base.h fast_encoder.h
|
||||
file_index.o : lzip.h file_index.h
|
||||
list.o : lzip.h file_index.h
|
||||
list.o : lzip.h lzip_index.h
|
||||
lzip_index.o : lzip.h lzip_index.h
|
||||
main.o : carg_parser.h lzip.h decoder.h encoder_base.h encoder.h fast_encoder.h
|
||||
|
||||
|
||||
|
|
47
NEWS
47
NEWS
|
@ -1,42 +1,17 @@
|
|||
Changes in version 1.10:
|
||||
Changes in version 1.11:
|
||||
|
||||
The option '--loose-trailing', has been added.
|
||||
Detection of forbidden combinations of characters in trailing data has been
|
||||
improved.
|
||||
|
||||
The test used by clzip to discriminate trailing data from a corrupt
|
||||
header in multimember or concatenated files has been improved to a
|
||||
Hamming distance (HD) of 3, and the 3 bit flips must happen in different
|
||||
magic bytes for the test to fail. As a consequence some kinds of files
|
||||
no longer can be appended to a lzip file as trailing data unless the
|
||||
'--loose-trailing' option is used when decompressing.
|
||||
Lziprecover can be used to remove conflicting trailing data from a file.
|
||||
Errors are now also checked when closing the input file.
|
||||
|
||||
The contents of a corrupt or truncated header found in a multimember
|
||||
file is now shown, after the error message, in the same format as
|
||||
trailing data.
|
||||
Clzip now compiles on DOS with DJGPP. (Patch from Robert Riebisch).
|
||||
|
||||
Option '-S, --volume-size' now keeps input files unchanged.
|
||||
The descriptions of '-0..-9', '-m' and '-s' in the manual have been
|
||||
improved.
|
||||
|
||||
When creating multimember files or splitting the output in volumes, the
|
||||
dictionary size is now adjusted for each member individually.
|
||||
The configure script now accepts appending options to CFLAGS using the
|
||||
syntax 'CFLAGS+=OPTIONS'.
|
||||
|
||||
The 'bits/byte' ratio has been replaced with the inverse compression
|
||||
ratio in the output.
|
||||
|
||||
The progress of decompression is now shown at verbosity level 2 (-vv) or
|
||||
higher.
|
||||
|
||||
Progress of (de)compression is only shown if stderr is a terminal.
|
||||
|
||||
A final diagnostic is now shown at verbosity level 1 (-v) or higher if
|
||||
any file fails the test when testing multiple files.
|
||||
|
||||
A second '.lz' extension is no longer added to the argument of '-o' if
|
||||
it already ends in '.lz' or '.tlz'.
|
||||
|
||||
In case of (de)compressed size mismatch, the stored size is now also
|
||||
shown in hexadecimal to ease visual comparison.
|
||||
|
||||
The dictionary size is now shown at verbosity level 4 (-vvvv) when
|
||||
decompressing or testing.
|
||||
|
||||
The new chapter "Meaning of clzip's output" has been added to the manual.
|
||||
It has been documented in INSTALL the use of
|
||||
CFLAGS+='-D __USE_MINGW_ANSI_STDIO' when compiling on MinGW.
|
||||
|
|
58
README
58
README
|
@ -1,32 +1,33 @@
|
|||
Description
|
||||
|
||||
Clzip is a C language version of lzip, fully compatible with lzip-1.4 or
|
||||
Clzip is a C language version of lzip, fully compatible with lzip 1.4 or
|
||||
newer. As clzip is written in C, it may be easier to integrate in
|
||||
applications like package managers, embedded devices, or systems lacking
|
||||
a C++ compiler.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0),
|
||||
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0)
|
||||
or compress most files more than bzip2 (lzip -9). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2
|
||||
from a data recovery perspective.
|
||||
from a data recovery perspective. Lzip has been designed, written and
|
||||
tested with great care to replace gzip and bzip2 as the standard
|
||||
general-purpose compressed format for unix-like systems.
|
||||
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
The lzip file format is designed for data sharing and long-term archiving,
|
||||
taking into account both data integrity and decoder availability:
|
||||
|
||||
* The lzip format provides very safe integrity checking and some data
|
||||
recovery means. The lziprecover program can repair bit-flip errors
|
||||
recovery means. The lziprecover program can repair bit flip errors
|
||||
(one of the most common forms of data corruption) in lzip files,
|
||||
and provides data recovery capabilities, including error-checked
|
||||
merging of damaged copies of a file.
|
||||
|
||||
* The lzip format is as simple as possible (but not simpler). The
|
||||
lzip manual provides the source code of a simple decompressor along
|
||||
with a detailed explanation of how it works, so that with the only
|
||||
help of the lzip manual it would be possible for a digital
|
||||
archaeologist to extract the data from a lzip file long after
|
||||
quantum computers eventually render LZMA obsolete.
|
||||
lzip manual provides the source code of a simple decompressor
|
||||
along with a detailed explanation of how it works, so that with
|
||||
the only help of the lzip manual it would be possible for a
|
||||
digital archaeologist to extract the data from a lzip file long
|
||||
after quantum computers eventually render LZMA obsolete.
|
||||
|
||||
* Additionally the lzip reference implementation is copylefted, which
|
||||
guarantees that it will remain free forever.
|
||||
|
@ -36,15 +37,14 @@ repair the nearer it is from the beginning of the file. Therefore, with
|
|||
the help of lziprecover, losing an entire archive just because of a
|
||||
corrupt byte near the beginning is a thing of the past.
|
||||
|
||||
Clzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
Clzip uses the same well-defined exit status values used by lzip, which
|
||||
makes it safer than compressors returning ambiguous warning values (like
|
||||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
Clzip will automatically use for each file the largest dictionary size
|
||||
that does not exceed neither the file size nor the limit given. Keep in
|
||||
mind that the decompression memory requirement is affected at
|
||||
compression time by the choice of dictionary size limit.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times the
|
||||
dictionary size limit (1 if input file size is less than dictionary size
|
||||
|
@ -64,22 +64,22 @@ anyothername becomes anyothername.out
|
|||
|
||||
(De)compressing a file is much like copying or moving it; therefore clzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as "cp -p" does. (If the user ID or
|
||||
possible, ownership of the file just as 'cp -p' does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
|
||||
Clzip is able to read from some types of non regular files if the
|
||||
"--stdout" option is specified.
|
||||
'--stdout' option is specified.
|
||||
|
||||
If no file names are specified, clzip compresses (or decompresses) from
|
||||
standard input to standard output. In this case, clzip will decline to
|
||||
write compressed output to a terminal, as this would be entirely
|
||||
incomprehensible and therefore pointless.
|
||||
|
||||
Clzip will correctly decompress a file which is the concatenation of two
|
||||
or more compressed files. The result is the concatenation of the
|
||||
corresponding decompressed files. Integrity testing of concatenated
|
||||
compressed files is also supported.
|
||||
Clzip will correctly decompress a file which is the concatenation of two or
|
||||
more compressed files. The result is the concatenation of the corresponding
|
||||
decompressed files. Integrity testing of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
Clzip can produce multimember files, and lziprecover can safely recover
|
||||
the undamaged members in case of file damage. Clzip can also split the
|
||||
|
@ -115,8 +115,12 @@ the definition of Markov chains), G.N.N. Martin (for the definition of
|
|||
range encoding), Igor Pavlov (for putting all the above together in
|
||||
LZMA), and Julian Seward (for bzip2's CLI).
|
||||
|
||||
LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
|
||||
have been compressed. Decompressed is used to refer to data which have
|
||||
undergone the process of decompression.
|
||||
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This file is free documentation: you have unlimited permission to copy,
|
||||
distribute and modify it.
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Arg_parser - POSIX/GNU command line argument parser. (C version)
|
||||
Copyright (C) 2006-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2006-2019 Antonio Diaz Diaz.
|
||||
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Arg_parser - POSIX/GNU command line argument parser. (C version)
|
||||
Copyright (C) 2006-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2006-2019 Antonio Diaz Diaz.
|
||||
|
||||
This library is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
|
16
configure
vendored
16
configure
vendored
|
@ -1,12 +1,12 @@
|
|||
#! /bin/sh
|
||||
# configure script for Clzip - LZMA lossless data compressor
|
||||
# Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
#
|
||||
# This configure script is free software: you have unlimited permission
|
||||
# to copy, distribute and modify it.
|
||||
|
||||
pkgname=clzip
|
||||
pkgversion=1.10
|
||||
pkgversion=1.11
|
||||
progname=clzip
|
||||
srctrigger=doc/${pkgname}.texi
|
||||
|
||||
|
@ -70,6 +70,7 @@ while [ $# != 0 ] ; do
|
|||
echo " CC=COMPILER C compiler to use [${CC}]"
|
||||
echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]"
|
||||
echo " CFLAGS=OPTIONS command line options for the C compiler [${CFLAGS}]"
|
||||
echo " CFLAGS+=OPTIONS append options to the current value of CFLAGS"
|
||||
echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]"
|
||||
echo
|
||||
exit 0 ;;
|
||||
|
@ -93,10 +94,11 @@ while [ $# != 0 ] ; do
|
|||
--mandir=*) mandir=${optarg} ;;
|
||||
--no-create) no_create=yes ;;
|
||||
|
||||
CC=*) CC=${optarg} ;;
|
||||
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
|
||||
CFLAGS=*) CFLAGS=${optarg} ;;
|
||||
LDFLAGS=*) LDFLAGS=${optarg} ;;
|
||||
CC=*) CC=${optarg} ;;
|
||||
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
|
||||
CFLAGS=*) CFLAGS=${optarg} ;;
|
||||
CFLAGS+=*) CFLAGS="${CFLAGS} ${optarg}" ;;
|
||||
LDFLAGS=*) LDFLAGS=${optarg} ;;
|
||||
|
||||
--*)
|
||||
echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;;
|
||||
|
@ -168,7 +170,7 @@ echo "LDFLAGS = ${LDFLAGS}"
|
|||
rm -f Makefile
|
||||
cat > Makefile << EOF
|
||||
# Makefile for Clzip - LZMA lossless data compressor
|
||||
# Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
# This file was generated automatically by configure. Don't edit.
|
||||
#
|
||||
# This Makefile is free software: you have unlimited permission
|
||||
|
|
142
decoder.c
142
decoder.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -101,15 +101,15 @@ void LZd_flush_data( struct LZ_decoder * const d )
|
|||
static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
||||
struct Pretty_print * const pp )
|
||||
{
|
||||
File_trailer trailer;
|
||||
int size = Rd_read_data( d->rdec, trailer, Ft_size );
|
||||
Lzip_trailer trailer;
|
||||
int size = Rd_read_data( d->rdec, trailer, Lt_size );
|
||||
const unsigned long long data_size = LZd_data_position( d );
|
||||
const unsigned long long member_size = Rd_member_position( d->rdec );
|
||||
unsigned td_crc;
|
||||
unsigned long long td_size, tm_size;
|
||||
bool error = false;
|
||||
|
||||
if( size < Ft_size )
|
||||
if( size < Lt_size )
|
||||
{
|
||||
error = true;
|
||||
if( verbosity >= 0 )
|
||||
|
@ -118,10 +118,10 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
fprintf( stderr, "Trailer truncated at trailer position %d;"
|
||||
" some checks may fail.\n", size );
|
||||
}
|
||||
while( size < Ft_size ) trailer[size++] = 0;
|
||||
while( size < Lt_size ) trailer[size++] = 0;
|
||||
}
|
||||
|
||||
td_crc = Ft_get_data_crc( trailer );
|
||||
td_crc = Lt_get_data_crc( trailer );
|
||||
if( td_crc != LZd_crc( d ) )
|
||||
{
|
||||
error = true;
|
||||
|
@ -132,7 +132,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
td_crc, LZd_crc( d ) );
|
||||
}
|
||||
}
|
||||
td_size = Ft_get_data_size( trailer );
|
||||
td_size = Lt_get_data_size( trailer );
|
||||
if( td_size != data_size )
|
||||
{
|
||||
error = true;
|
||||
|
@ -143,7 +143,7 @@ static bool LZd_verify_trailer( struct LZ_decoder * const d,
|
|||
td_size, td_size, data_size, data_size );
|
||||
}
|
||||
}
|
||||
tm_size = Ft_get_member_size( trailer );
|
||||
tm_size = Lt_get_member_size( trailer );
|
||||
if( tm_size != member_size )
|
||||
{
|
||||
error = true;
|
||||
|
@ -214,9 +214,11 @@ int LZd_decode_member( struct LZ_decoder * const d,
|
|||
Rd_load( rdec );
|
||||
while( !Rd_finished( rdec ) )
|
||||
{
|
||||
int len;
|
||||
const int pos_state = LZd_data_position( d ) & pos_state_mask;
|
||||
if( Rd_decode_bit( rdec, &bm_match[state][pos_state] ) == 0 ) /* 1st bit */
|
||||
if( Rd_decode_bit( rdec, &bm_match[state][pos_state] ) == 0 ) /* 1st bit */
|
||||
{
|
||||
/* literal byte */
|
||||
Bit_model * const bm = bm_literal[get_lit_state(LZd_peek_prev( d ))];
|
||||
if( St_is_char( state ) )
|
||||
{
|
||||
|
@ -228,83 +230,81 @@ int LZd_decode_member( struct LZ_decoder * const d,
|
|||
state -= ( state < 10 ) ? 3 : 6;
|
||||
LZd_put_byte( d, Rd_decode_matched( rdec, bm, LZd_peek( d, rep0 ) ) );
|
||||
}
|
||||
continue;
|
||||
}
|
||||
else /* match or repeated match */
|
||||
/* match or repeated match */
|
||||
if( Rd_decode_bit( rdec, &bm_rep[state] ) != 0 ) /* 2nd bit */
|
||||
{
|
||||
int len;
|
||||
if( Rd_decode_bit( rdec, &bm_rep[state] ) != 0 ) /* 2nd bit */
|
||||
if( Rd_decode_bit( rdec, &bm_rep0[state] ) == 0 ) /* 3rd bit */
|
||||
{
|
||||
if( Rd_decode_bit( rdec, &bm_rep0[state] ) == 0 ) /* 3rd bit */
|
||||
{
|
||||
if( Rd_decode_bit( rdec, &bm_len[state][pos_state] ) == 0 ) /* 4th bit */
|
||||
{ state = St_set_short_rep( state );
|
||||
LZd_put_byte( d, LZd_peek( d, rep0 ) ); continue; }
|
||||
}
|
||||
else
|
||||
{
|
||||
unsigned distance;
|
||||
if( Rd_decode_bit( rdec, &bm_rep1[state] ) == 0 ) /* 4th bit */
|
||||
distance = rep1;
|
||||
else
|
||||
{
|
||||
if( Rd_decode_bit( rdec, &bm_rep2[state] ) == 0 ) /* 5th bit */
|
||||
distance = rep2;
|
||||
else
|
||||
{ distance = rep3; rep3 = rep2; }
|
||||
rep2 = rep1;
|
||||
}
|
||||
rep1 = rep0;
|
||||
rep0 = distance;
|
||||
}
|
||||
state = St_set_rep( state );
|
||||
len = min_match_len + Rd_decode_len( rdec, &rep_len_model, pos_state );
|
||||
if( Rd_decode_bit( rdec, &bm_len[state][pos_state] ) == 0 ) /* 4th bit */
|
||||
{ state = St_set_short_rep( state );
|
||||
LZd_put_byte( d, LZd_peek( d, rep0 ) ); continue; }
|
||||
}
|
||||
else /* match */
|
||||
else
|
||||
{
|
||||
unsigned distance;
|
||||
len = min_match_len + Rd_decode_len( rdec, &match_len_model, pos_state );
|
||||
distance = Rd_decode_tree6( rdec, bm_dis_slot[get_len_state(len)] );
|
||||
if( distance >= start_dis_model )
|
||||
if( Rd_decode_bit( rdec, &bm_rep1[state] ) == 0 ) /* 4th bit */
|
||||
distance = rep1;
|
||||
else
|
||||
{
|
||||
const unsigned dis_slot = distance;
|
||||
const int direct_bits = ( dis_slot >> 1 ) - 1;
|
||||
distance = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
|
||||
if( dis_slot < end_dis_model )
|
||||
distance += Rd_decode_tree_reversed( rdec,
|
||||
bm_dis + ( distance - dis_slot ), direct_bits );
|
||||
if( Rd_decode_bit( rdec, &bm_rep2[state] ) == 0 ) /* 5th bit */
|
||||
distance = rep2;
|
||||
else
|
||||
{ distance = rep3; rep3 = rep2; }
|
||||
rep2 = rep1;
|
||||
}
|
||||
rep1 = rep0;
|
||||
rep0 = distance;
|
||||
}
|
||||
state = St_set_rep( state );
|
||||
len = min_match_len + Rd_decode_len( rdec, &rep_len_model, pos_state );
|
||||
}
|
||||
else /* match */
|
||||
{
|
||||
unsigned distance;
|
||||
len = min_match_len + Rd_decode_len( rdec, &match_len_model, pos_state );
|
||||
distance = Rd_decode_tree6( rdec, bm_dis_slot[get_len_state(len)] );
|
||||
if( distance >= start_dis_model )
|
||||
{
|
||||
const unsigned dis_slot = distance;
|
||||
const int direct_bits = ( dis_slot >> 1 ) - 1;
|
||||
distance = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
|
||||
if( dis_slot < end_dis_model )
|
||||
distance += Rd_decode_tree_reversed( rdec,
|
||||
bm_dis + ( distance - dis_slot ), direct_bits );
|
||||
else
|
||||
{
|
||||
distance +=
|
||||
Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits;
|
||||
distance += Rd_decode_tree_reversed4( rdec, bm_align );
|
||||
if( distance == 0xFFFFFFFFU ) /* marker found */
|
||||
{
|
||||
distance +=
|
||||
Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits;
|
||||
distance += Rd_decode_tree_reversed4( rdec, bm_align );
|
||||
if( distance == 0xFFFFFFFFU ) /* marker found */
|
||||
Rd_normalize( rdec );
|
||||
LZd_flush_data( d );
|
||||
if( len == min_match_len ) /* End Of Stream marker */
|
||||
{
|
||||
Rd_normalize( rdec );
|
||||
LZd_flush_data( d );
|
||||
if( len == min_match_len ) /* End Of Stream marker */
|
||||
{
|
||||
if( LZd_verify_trailer( d, pp ) ) return 0; else return 3;
|
||||
}
|
||||
if( len == min_match_len + 1 ) /* Sync Flush marker */
|
||||
{
|
||||
Rd_load( rdec ); continue;
|
||||
}
|
||||
if( verbosity >= 0 )
|
||||
{
|
||||
Pp_show_msg( pp, 0 );
|
||||
fprintf( stderr, "Unsupported marker code '%d'\n", len );
|
||||
}
|
||||
return 4;
|
||||
if( LZd_verify_trailer( d, pp ) ) return 0; else return 3;
|
||||
}
|
||||
if( len == min_match_len + 1 ) /* Sync Flush marker */
|
||||
{
|
||||
Rd_load( rdec ); continue;
|
||||
}
|
||||
if( verbosity >= 0 )
|
||||
{
|
||||
Pp_show_msg( pp, 0 );
|
||||
fprintf( stderr, "Unsupported marker code '%d'\n", len );
|
||||
}
|
||||
return 4;
|
||||
}
|
||||
}
|
||||
rep3 = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance;
|
||||
state = St_set_match( state );
|
||||
if( rep0 >= d->dictionary_size || ( rep0 >= d->pos && !d->pos_wrapped ) )
|
||||
{ LZd_flush_data( d ); return 1; }
|
||||
}
|
||||
LZd_copy_block( d, rep0, len );
|
||||
rep3 = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance;
|
||||
state = St_set_match( state );
|
||||
if( rep0 >= d->dictionary_size || ( rep0 >= d->pos && !d->pos_wrapped ) )
|
||||
{ LZd_flush_data( d ); return 1; }
|
||||
}
|
||||
LZd_copy_block( d, rep0, len );
|
||||
}
|
||||
LZd_flush_data( d );
|
||||
return 2;
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
|
19
doc/clzip.1
19
doc/clzip.1
|
@ -1,12 +1,23 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH CLZIP "1" "February 2018" "clzip 1.10" "User Commands"
|
||||
.TH CLZIP "1" "January 2019" "clzip 1.11" "User Commands"
|
||||
.SH NAME
|
||||
clzip \- reduces the size of files
|
||||
.SH SYNOPSIS
|
||||
.B clzip
|
||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Clzip \- LZMA lossless data compressor.
|
||||
Clzip is a C language version of lzip, fully compatible with lzip 1.4 or
|
||||
newer. As clzip is written in C, it may be easier to integrate in
|
||||
applications like package managers, embedded devices, or systems lacking
|
||||
a C++ compiler.
|
||||
.PP
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip \fB\-0\fR)
|
||||
or compress most files more than bzip2 (lzip \fB\-9\fR). Decompression speed is
|
||||
intermediate between gzip and bzip2. Lzip is better than gzip and bzip2
|
||||
from a data recovery perspective. Lzip has been designed, written and
|
||||
tested with great care to replace gzip and bzip2 as the standard
|
||||
general\-purpose compressed format for unix\-like systems.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-h\fR, \fB\-\-help\fR
|
||||
|
@ -52,7 +63,7 @@ suppress all messages
|
|||
set dictionary size limit in bytes [8 MiB]
|
||||
.TP
|
||||
\fB\-S\fR, \fB\-\-volume\-size=\fR<bytes>
|
||||
set volume size limit in bytes, implies \fB\-k\fR
|
||||
set volume size limit in bytes
|
||||
.TP
|
||||
\fB\-t\fR, \fB\-\-test\fR
|
||||
test compressed file integrity
|
||||
|
@ -93,7 +104,7 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
.br
|
||||
Clzip home page: http://www.nongnu.org/lzip/clzip.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||
Copyright \(co 2019 Antonio Diaz Diaz.
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
316
doc/clzip.info
316
doc/clzip.info
|
@ -11,7 +11,7 @@ File: clzip.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Clzip Manual
|
||||
************
|
||||
|
||||
This manual is for Clzip (version 1.10, 6 February 2018).
|
||||
This manual is for Clzip (version 1.11, 3 January 2019).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -29,7 +29,7 @@ This manual is for Clzip (version 1.10, 6 February 2018).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to
|
||||
copy, distribute and modify it.
|
||||
|
@ -40,14 +40,14 @@ File: clzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Clzip is a C language version of lzip, fully compatible with lzip-1.4 or
|
||||
newer. As clzip is written in C, it may be easier to integrate in
|
||||
applications like package managers, embedded devices, or systems lacking
|
||||
a C++ compiler.
|
||||
Clzip is a C language version of lzip, fully compatible with lzip 1.4
|
||||
or newer. As clzip is written in C, it may be easier to integrate in
|
||||
applications like package managers, embedded devices, or systems
|
||||
lacking a C++ compiler.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to
|
||||
the one of gzip or bzip2. Lzip can compress about as fast as gzip
|
||||
(lzip -0), or compress most files more than bzip2 (lzip -9).
|
||||
(lzip -0) or compress most files more than bzip2 (lzip -9).
|
||||
Decompression speed is intermediate between gzip and bzip2. Lzip is
|
||||
better than gzip and bzip2 from a data recovery perspective.
|
||||
|
||||
|
@ -88,15 +88,15 @@ microscopic. Be aware, though, that the check occurs upon
|
|||
decompression, so it can only tell you that something is wrong. It
|
||||
can't help you recover the original uncompressed data.
|
||||
|
||||
Clzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
Clzip uses the same well-defined exit status values used by lzip,
|
||||
which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs
|
||||
like tar or zutils.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size
|
||||
for each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
Clzip will automatically use for each file the largest dictionary
|
||||
size that does not exceed neither the file size nor the limit given.
|
||||
Keep in mind that the decompression memory requirement is affected at
|
||||
compression time by the choice of dictionary size limit.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times
|
||||
the dictionary size limit (1 if input file size is less than dictionary
|
||||
|
@ -116,7 +116,7 @@ anyothername becomes anyothername.out
|
|||
|
||||
(De)compressing a file is much like copying or moving it; therefore
|
||||
clzip preserves the access and modification dates, permissions, and,
|
||||
when possible, ownership of the file just as "cp -p" does. (If the user
|
||||
when possible, ownership of the file just as 'cp -p' does. (If the user
|
||||
ID or the group ID can't be duplicated, the file permission bits
|
||||
S_ISUID and S_ISGID are cleared).
|
||||
|
||||
|
@ -214,6 +214,7 @@ command line.
|
|||
'-V'
|
||||
'--version'
|
||||
Print the version number of clzip on the standard output and exit.
|
||||
This version number should be included in all bug reports.
|
||||
|
||||
'-a'
|
||||
'--trailing-error'
|
||||
|
@ -298,12 +299,14 @@ command line.
|
|||
'-s BYTES'
|
||||
'--dictionary-size=BYTES'
|
||||
When compressing, set the dictionary size limit in bytes. Clzip
|
||||
will use the smallest possible dictionary size for each file
|
||||
without exceeding this limit. Valid values range from 4 KiB to
|
||||
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
|
||||
the specified size does not match one of the valid sizes, it will
|
||||
be rounded upwards by adding up to (BYTES / 8) to it.
|
||||
will use for each file the largest dictionary size that does not
|
||||
exceed neither the file size nor this limit. Valid values range
|
||||
from 4 KiB to 512 MiB. Values 12 to 29 are interpreted as powers
|
||||
of two, meaning 2^12 to 2^29 bytes. Dictionary sizes are quantized
|
||||
so that they can be coded in just one byte (*note
|
||||
coded-dict-size::). If the specified size does not match one of
|
||||
the valid sizes, it will be rounded upwards by adding up to
|
||||
(BYTES / 8) to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as
|
||||
large as possible, but keep in mind that the decompression memory
|
||||
|
@ -342,27 +345,32 @@ command line.
|
|||
Two or more '-v' options show the progress of (de)compression.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression parameters (dictionary size and match length
|
||||
limit) as shown in the table below. The default compression level
|
||||
is '-6'. Note that '-9' can be much slower than '-0'. These
|
||||
options have no effect when decompressing, testing or listing.
|
||||
Compression level. Set the compression parameters (dictionary size
|
||||
and match length limit) as shown in the table below. The default
|
||||
compression level is '-6', equivalent to '-s8MiB -m36'. Note that
|
||||
'-9' can be much slower than '-0'. These options have no effect
|
||||
when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a
|
||||
linear scale optimal for all files. If your files are large, very
|
||||
repetitive, etc, you may need to use the '--dictionary-size' and
|
||||
'--match-length' options directly to achieve optimal performance.
|
||||
|
||||
Level Dictionary size Match length limit
|
||||
-0 64 KiB 16 bytes
|
||||
-1 1 MiB 5 bytes
|
||||
-2 1.5 MiB 6 bytes
|
||||
-3 2 MiB 8 bytes
|
||||
-4 3 MiB 12 bytes
|
||||
-5 4 MiB 20 bytes
|
||||
-6 8 MiB 36 bytes
|
||||
-7 16 MiB 68 bytes
|
||||
-8 24 MiB 132 bytes
|
||||
-9 32 MiB 273 bytes
|
||||
If several compression levels or '-s' or '-m' options are given,
|
||||
the last setting is used. For example '-9 -s64MiB' is equivalent
|
||||
to '-s64MiB -m273'
|
||||
|
||||
Level Dictionary size (-s) Match length limit (-m)
|
||||
-0 64 KiB 16 bytes
|
||||
-1 1 MiB 5 bytes
|
||||
-2 1.5 MiB 6 bytes
|
||||
-3 2 MiB 8 bytes
|
||||
-4 3 MiB 12 bytes
|
||||
-5 4 MiB 20 bytes
|
||||
-6 8 MiB 36 bytes
|
||||
-7 16 MiB 68 bytes
|
||||
-8 24 MiB 132 bytes
|
||||
-9 32 MiB 273 bytes
|
||||
|
||||
'--fast'
|
||||
'--best'
|
||||
|
@ -409,10 +417,10 @@ is to make it so complicated that there are no obvious deficiencies. The
|
|||
first method is far more difficult.
|
||||
-- C.A.R. Hoare
|
||||
|
||||
Lzip has been designed, written and tested with great care to be the
|
||||
standard general-purpose compressor for unix-like systems. This chapter
|
||||
describes the lessons learned from previous compressors (gzip and
|
||||
bzip2), and their application to the design of lzip.
|
||||
Lzip has been designed, written and tested with great care to replace
|
||||
gzip and bzip2 as the standard general-purpose compressed format for
|
||||
unix-like systems. This chapter describes the lessons learned from
|
||||
these previous formats, and their application to the design of lzip.
|
||||
|
||||
|
||||
4.1 Format design
|
||||
|
@ -455,17 +463,20 @@ error detection. Any distance larger than the dictionary size acts as a
|
|||
forbidden symbol, allowing the decompressor to detect the approximate
|
||||
position of errors, and leaving very little work for the check sequence
|
||||
(CRC and data sizes) in the detection of errors. Lzip is usually able
|
||||
to detect all posible bit flips in the compressed data without
|
||||
to detect all possible bit flips in the compressed data without
|
||||
resorting to the check sequence. It would be difficult to write an
|
||||
automatic recovery tool like lziprecover for the gzip format. And, as
|
||||
far as I know, it has never been written.
|
||||
|
||||
Lzip, like gzip and bzip2, uses a CRC32 to check the integrity of the
|
||||
decompressed data because it provides more accurate error detection than
|
||||
CRC64 up to a compressed size of about 16 GiB, a size larger than that
|
||||
of most files. In the case of lzip, the additional detection capability
|
||||
of the decompressor reduces the probability of undetected errors more
|
||||
than a million times beyond what the CRC32 alone provides.
|
||||
decompressed data because it provides optimal accuracy in the detection
|
||||
of errors up to a compressed size of about 16 GiB, a size larger than
|
||||
that of most files. In the case of lzip, the additional detection
|
||||
capability of the decompressor reduces the probability of undetected
|
||||
errors about four million times more, resulting in a combined integrity
|
||||
checking optimally accurate for any member size produced by lzip.
|
||||
Preliminary results suggest that the lzip format is safe enough to be
|
||||
used in critical safety avionics systems.
|
||||
|
||||
The lzip format is designed for long-term archiving. Therefore it
|
||||
excludes any unneeded features that may interfere with the future
|
||||
|
@ -520,7 +531,7 @@ extraction of the decompressed data.
|
|||
Bzip2 does not store the uncompressed size of the file.
|
||||
|
||||
The lzip format provides a 64-bit field for the uncompressed size.
|
||||
Additionaly, lzip produces multimember output automatically when
|
||||
Additionally, lzip produces multimember output automatically when
|
||||
the size is too large for a single member, allowing for an
|
||||
unlimited uncompressed size.
|
||||
|
||||
|
@ -568,9 +579,9 @@ extraction of the decompressed data.
|
|||
(lziprecover)Unzcrash.
|
||||
|
||||
'Dictionary size'
|
||||
Lzip automatically uses the smallest possible dictionary size for
|
||||
each file. In addition to reducing the amount of memory required
|
||||
for decompression, this feature also minimizes the probability of
|
||||
Lzip automatically adapts the dictionary size to the size of each
|
||||
file. In addition to reducing the amount of memory required for
|
||||
decompression, this feature also minimizes the probability of
|
||||
being affected by RAM errors during compression.
|
||||
|
||||
'Exit status'
|
||||
|
@ -624,11 +635,11 @@ additional information before, between, or after them.
|
|||
|
||||
'DS (coded dictionary size, 1 byte)'
|
||||
The dictionary size is calculated by taking a power of 2 (the base
|
||||
size) and substracting from it a fraction between 0/16 and 7/16 of
|
||||
size) and subtracting from it a fraction between 0/16 and 7/16 of
|
||||
the base size.
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to
|
||||
substract from the base size to obtain the dictionary size.
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
||||
|
@ -767,7 +778,7 @@ reusing a recently used distance). There are 7 different coding
|
|||
sequences:
|
||||
|
||||
Bit sequence Name Description
|
||||
---------------------------------------------------------------------------
|
||||
------------------------------------------------------------------------
|
||||
0 + byte literal literal byte
|
||||
1 + 0 + len + dis match distance-length pair
|
||||
1 + 1 + 0 + 0 shortrep 1 byte match at latest used distance
|
||||
|
@ -787,7 +798,7 @@ order, from MSB to LSB, except where noted otherwise.
|
|||
Lengths (the 'len' in the table above) are coded as follows:
|
||||
|
||||
Bit sequence Description
|
||||
--------------------------------------------------------------------------
|
||||
------------------------------------------------------------------------
|
||||
0 + 3 bits lengths from 2 to 9
|
||||
1 + 0 + 3 bits lengths from 10 to 17
|
||||
1 + 1 + 8 bits lengths from 18 to 273
|
||||
|
@ -828,7 +839,7 @@ order (from LSB to MSB). For distances >= 128, the 'direct_bits - 4'
|
|||
part is coded with fixed 0.5 probability.
|
||||
|
||||
Bit sequence Description
|
||||
--------------------------------------------------------------------------
|
||||
------------------------------------------------------------------------
|
||||
slot distances from 0 to 3
|
||||
slot + direct_bits distances from 4 to 127
|
||||
slot + (direct_bits - 4) + 4 bits distances from 128 to 2^32 - 1
|
||||
|
@ -864,7 +875,7 @@ byte. 'rep' is any one of 'rep0', 'rep1', 'rep2' or 'rep3'. The types
|
|||
of previous sequences corresponding to each state are:
|
||||
|
||||
State Types of previous sequences
|
||||
--------------------------------------------------------
|
||||
------------------------------------------------------
|
||||
0 literal, literal, literal
|
||||
1 match, literal, literal
|
||||
2 rep or (!literal, shortrep), literal, literal
|
||||
|
@ -881,24 +892,24 @@ State Types of previous sequences
|
|||
|
||||
The contexts for decoding the type of coding sequence are:
|
||||
|
||||
Name Indices Used when
|
||||
---------------------------------------------------------------------------
|
||||
bm_match state, pos_state sequence start
|
||||
bm_rep state after sequence 1
|
||||
bm_rep0 state after sequence 11
|
||||
bm_rep1 state after sequence 111
|
||||
bm_rep2 state after sequence 1111
|
||||
bm_len state, pos_state after sequence 110
|
||||
Name Indices Used when
|
||||
-----------------------------------------------------------------------
|
||||
bm_match state, pos_state sequence start
|
||||
bm_rep state after sequence 1
|
||||
bm_rep0 state after sequence 11
|
||||
bm_rep1 state after sequence 111
|
||||
bm_rep2 state after sequence 1111
|
||||
bm_len state, pos_state after sequence 110
|
||||
|
||||
|
||||
The contexts for decoding distances are:
|
||||
|
||||
Name Indices Used when
|
||||
---------------------------------------------------------------------------
|
||||
bm_dis_slot len_state, bit tree distance start
|
||||
bm_dis reverse bit tree after slots 4 to 13
|
||||
bm_align reverse bit tree for distances >= 128, after
|
||||
fixed probability bits
|
||||
Name Indices Used when
|
||||
------------------------------------------------------------------------
|
||||
bm_dis_slot len_state, bit tree distance start
|
||||
bm_dis reverse bit tree after slots 4 to 13
|
||||
bm_align reverse bit tree for distances >= 128, after fixed
|
||||
probability bits
|
||||
|
||||
|
||||
There are two separate sets of contexts for lengths ('Len_model' in
|
||||
|
@ -906,7 +917,7 @@ the source). One for normal matches, the other for repeated matches. The
|
|||
contexts in each Len_model are (see 'decode_len' in the source):
|
||||
|
||||
Name Indices Used when
|
||||
---------------------------------------------------------------------------
|
||||
------------------------------------------------------------------------
|
||||
choice1 none length start
|
||||
choice2 none after sequence 1
|
||||
bm_low pos_state, bit tree after sequence 0
|
||||
|
@ -1013,7 +1024,11 @@ compressed file (bugs in the system libraries, memory errors, etc).
|
|||
Therefore, if the data you are going to compress are important, give the
|
||||
'--keep' option to clzip and don't remove the original file until you
|
||||
verify the compressed file with a command like
|
||||
'clzip -cd file.lz | cmp file -'.
|
||||
'clzip -cd file.lz | cmp file -'. Most RAM errors happening during
|
||||
compression can only be detected by comparing the compressed file with
|
||||
the original because the corruption happens before clzip compresses the
|
||||
RAM contents, resulting in a valid compressed file containing wrong
|
||||
data.
|
||||
|
||||
|
||||
Example 1: Replace a regular file with its compressed version 'file.lz'
|
||||
|
@ -1106,7 +1121,7 @@ Appendix A Reference source code
|
|||
********************************
|
||||
|
||||
/* Lzd - Educational decompressor for the lzip format
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -1136,7 +1151,7 @@ Appendix A Reference source code
|
|||
#include <cstring>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#include <fcntl.h>
|
||||
#include <io.h>
|
||||
#endif
|
||||
|
@ -1237,9 +1252,9 @@ public:
|
|||
const CRC32 crc32;
|
||||
|
||||
|
||||
typedef uint8_t File_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
|
||||
typedef uint8_t Lzip_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
|
||||
|
||||
typedef uint8_t File_trailer[20];
|
||||
typedef uint8_t Lzip_trailer[20];
|
||||
// 0-3 CRC32 of the uncompressed data
|
||||
// 4-11 size of the uncompressed data
|
||||
// 12-19 member size including header and trailer
|
||||
|
@ -1433,6 +1448,7 @@ bool LZ_decoder::decode_member() // Returns false if error
|
|||
const int pos_state = data_position() & pos_state_mask;
|
||||
if( rdec.decode_bit( bm_match[state()][pos_state] ) == 0 ) // 1st bit
|
||||
{
|
||||
// literal byte
|
||||
const uint8_t prev_byte = peek( 0 );
|
||||
const int literal_state = prev_byte >> ( 8 - literal_context_bits );
|
||||
Bit_model * const bm = bm_literal[literal_state];
|
||||
|
@ -1441,67 +1457,66 @@ bool LZ_decoder::decode_member() // Returns false if error
|
|||
else
|
||||
put_byte( rdec.decode_matched( bm, peek( rep0 ) ) );
|
||||
state.set_char();
|
||||
continue;
|
||||
}
|
||||
else // match or repeated match
|
||||
// match or repeated match
|
||||
int len;
|
||||
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
|
||||
{
|
||||
int len;
|
||||
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
|
||||
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
|
||||
{
|
||||
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
|
||||
{
|
||||
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
|
||||
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
|
||||
}
|
||||
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
|
||||
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
|
||||
}
|
||||
else
|
||||
{
|
||||
unsigned distance;
|
||||
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
|
||||
distance = rep1;
|
||||
else
|
||||
{
|
||||
unsigned distance;
|
||||
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
|
||||
distance = rep1;
|
||||
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
|
||||
distance = rep2;
|
||||
else
|
||||
{
|
||||
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
|
||||
distance = rep2;
|
||||
else
|
||||
{ distance = rep3; rep3 = rep2; }
|
||||
rep2 = rep1;
|
||||
}
|
||||
rep1 = rep0;
|
||||
rep0 = distance;
|
||||
{ distance = rep3; rep3 = rep2; }
|
||||
rep2 = rep1;
|
||||
}
|
||||
state.set_rep();
|
||||
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
|
||||
rep1 = rep0;
|
||||
rep0 = distance;
|
||||
}
|
||||
else // match
|
||||
{
|
||||
rep3 = rep2; rep2 = rep1; rep1 = rep0;
|
||||
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
|
||||
const int len_state = std::min( len - min_match_len, len_states - 1 );
|
||||
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
|
||||
if( rep0 >= start_dis_model )
|
||||
{
|
||||
const unsigned dis_slot = rep0;
|
||||
const int direct_bits = ( dis_slot >> 1 ) - 1;
|
||||
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
|
||||
if( dis_slot < end_dis_model )
|
||||
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
|
||||
direct_bits );
|
||||
else
|
||||
{
|
||||
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
|
||||
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
|
||||
if( rep0 == 0xFFFFFFFFU ) // marker found
|
||||
{
|
||||
flush_data();
|
||||
return ( len == min_match_len ); // End Of Stream marker
|
||||
}
|
||||
}
|
||||
}
|
||||
state.set_match();
|
||||
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
|
||||
{ flush_data(); return false; }
|
||||
}
|
||||
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
|
||||
state.set_rep();
|
||||
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
|
||||
}
|
||||
else // match
|
||||
{
|
||||
rep3 = rep2; rep2 = rep1; rep1 = rep0;
|
||||
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
|
||||
const int len_state = std::min( len - min_match_len, len_states - 1 );
|
||||
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
|
||||
if( rep0 >= start_dis_model )
|
||||
{
|
||||
const unsigned dis_slot = rep0;
|
||||
const int direct_bits = ( dis_slot >> 1 ) - 1;
|
||||
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
|
||||
if( dis_slot < end_dis_model )
|
||||
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
|
||||
direct_bits );
|
||||
else
|
||||
{
|
||||
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
|
||||
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
|
||||
if( rep0 == 0xFFFFFFFFU ) // marker found
|
||||
{
|
||||
flush_data();
|
||||
return ( len == min_match_len ); // End Of Stream marker
|
||||
}
|
||||
}
|
||||
}
|
||||
state.set_match();
|
||||
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
|
||||
{ flush_data(); return false; }
|
||||
}
|
||||
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
|
||||
}
|
||||
flush_data();
|
||||
return false;
|
||||
|
@ -1519,7 +1534,7 @@ int main( const int argc, const char * const argv[] )
|
|||
"It is not safe to use lzd for any real work.\n"
|
||||
"\nUsage: %s < file.lz > file\n", argv[0] );
|
||||
std::printf( "Lzd decompresses from standard input to standard output.\n"
|
||||
"\nCopyright (C) 2018 Antonio Diaz Diaz.\n"
|
||||
"\nCopyright (C) 2019 Antonio Diaz Diaz.\n"
|
||||
"This is free software: you are free to change and redistribute it.\n"
|
||||
"There is NO WARRANTY, to the extent permitted by law.\n"
|
||||
"Report bugs to lzip-bug@nongnu.org\n"
|
||||
|
@ -1527,14 +1542,14 @@ int main( const int argc, const char * const argv[] )
|
|||
return 0;
|
||||
}
|
||||
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
|
||||
setmode( fileno( stdin ), O_BINARY );
|
||||
setmode( fileno( stdout ), O_BINARY );
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
setmode( STDIN_FILENO, O_BINARY );
|
||||
setmode( STDOUT_FILENO, O_BINARY );
|
||||
#endif
|
||||
|
||||
for( bool first_member = true; ; first_member = false )
|
||||
{
|
||||
File_header header; // verify header
|
||||
Lzip_header header; // verify header
|
||||
for( int i = 0; i < 6; ++i ) header[i] = std::getc( stdin );
|
||||
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
|
||||
{
|
||||
|
@ -1553,7 +1568,7 @@ int main( const int argc, const char * const argv[] )
|
|||
if( !decoder.decode_member() )
|
||||
{ std::fputs( "Data error\n", stderr ); return 2; }
|
||||
|
||||
File_trailer trailer; // verify trailer
|
||||
Lzip_trailer trailer; // verify trailer
|
||||
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
|
||||
unsigned crc = 0;
|
||||
for( int i = 3; i >= 0; --i ) { crc <<= 8; crc += trailer[i]; }
|
||||
|
@ -1598,20 +1613,21 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top210
|
||||
Node: Introduction1210
|
||||
Node: Output6491
|
||||
Node: Invoking clzip8011
|
||||
Ref: --trailing-error8577
|
||||
Node: Quality assurance16230
|
||||
Node: File format24640
|
||||
Node: Algorithm27045
|
||||
Node: Stream format29875
|
||||
Node: Trailing data40616
|
||||
Node: Examples42894
|
||||
Ref: concat-example44076
|
||||
Node: Problems45121
|
||||
Node: Reference source code45657
|
||||
Node: Concept index59974
|
||||
Node: Introduction1209
|
||||
Node: Output6498
|
||||
Node: Invoking clzip8018
|
||||
Ref: --trailing-error8648
|
||||
Node: Quality assurance16666
|
||||
Node: File format25271
|
||||
Ref: coded-dict-size26564
|
||||
Node: Algorithm27674
|
||||
Node: Stream format30504
|
||||
Node: Trailing data41156
|
||||
Node: Examples43434
|
||||
Ref: concat-example44866
|
||||
Node: Problems45911
|
||||
Node: Reference source code46447
|
||||
Node: Concept index60660
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
251
doc/clzip.texi
251
doc/clzip.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 6 February 2018
|
||||
@set VERSION 1.10
|
||||
@set UPDATED 3 January 2019
|
||||
@set VERSION 1.11
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -50,7 +50,7 @@ This manual is for Clzip (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission
|
||||
to copy, distribute and modify it.
|
||||
|
@ -60,20 +60,20 @@ to copy, distribute and modify it.
|
|||
@chapter Introduction
|
||||
@cindex introduction
|
||||
|
||||
Clzip is a C language version of lzip, fully compatible with lzip-1.4 or
|
||||
newer. As clzip is written in C, it may be easier to integrate in
|
||||
applications like package managers, embedded devices, or systems lacking
|
||||
a C++ compiler.
|
||||
@uref{http://www.nongnu.org/lzip/clzip.html,,Clzip} is a C language version
|
||||
of lzip, fully compatible with @w{lzip 1.4} or newer. As clzip is written in
|
||||
C, it may be easier to integrate in applications like package managers,
|
||||
embedded devices, or systems lacking a C++ compiler.
|
||||
|
||||
Lzip is a lossless data compressor with a user interface similar to the
|
||||
one of gzip or bzip2. Lzip can compress about as fast as gzip
|
||||
@w{(lzip -0)}, or compress most files more than bzip2 @w{(lzip -9)}.
|
||||
Decompression speed is intermediate between gzip and bzip2. Lzip is
|
||||
better than gzip and bzip2 from a data recovery perspective.
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} is a lossless data
|
||||
compressor with a user interface similar to the one of gzip or bzip2. Lzip
|
||||
can compress about as fast as gzip @w{(lzip -0)} or compress most files more
|
||||
than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between gzip
|
||||
and bzip2. Lzip is better than gzip and bzip2 from a data recovery
|
||||
perspective.
|
||||
|
||||
The lzip file format is designed for data sharing and long-term
|
||||
archiving, taking into account both data integrity and decoder
|
||||
availability:
|
||||
The lzip file format is designed for data sharing and long-term archiving,
|
||||
taking into account both data integrity and decoder availability:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -116,15 +116,14 @@ though, that the check occurs upon decompression, so it can only tell
|
|||
you that something is wrong. It can't help you recover the original
|
||||
uncompressed data.
|
||||
|
||||
Clzip uses the same well-defined exit status values used by lzip and
|
||||
bzip2, which makes it safer than compressors returning ambiguous warning
|
||||
values (like gzip) when it is used as a back end for other programs like
|
||||
tar or zutils.
|
||||
Clzip uses the same well-defined exit status values used by lzip, which
|
||||
makes it safer than compressors returning ambiguous warning values (like
|
||||
gzip) when it is used as a back end for other programs like tar or zutils.
|
||||
|
||||
Clzip will automatically use the smallest possible dictionary size for
|
||||
each file without exceeding the given limit. Keep in mind that the
|
||||
decompression memory requirement is affected at compression time by the
|
||||
choice of dictionary size limit.
|
||||
Clzip will automatically use for each file the largest dictionary size
|
||||
that does not exceed neither the file size nor the limit given. Keep in
|
||||
mind that the decompression memory requirement is affected at
|
||||
compression time by the choice of dictionary size limit.
|
||||
|
||||
The amount of memory required for compression is about 1 or 2 times the
|
||||
dictionary size limit (1 if input file size is less than dictionary size
|
||||
|
@ -146,7 +145,7 @@ file from that of the compressed file as follows:
|
|||
|
||||
(De)compressing a file is much like copying or moving it; therefore clzip
|
||||
preserves the access and modification dates, permissions, and, when
|
||||
possible, ownership of the file just as "cp -p" does. (If the user ID or
|
||||
possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
|
||||
the group ID can't be duplicated, the file permission bits S_ISUID and
|
||||
S_ISGID are cleared).
|
||||
|
||||
|
@ -252,6 +251,7 @@ Print an informative help message describing the options and exit.
|
|||
@item -V
|
||||
@itemx --version
|
||||
Print the version number of clzip on the standard output and exit.
|
||||
This version number should be included in all bug reports.
|
||||
|
||||
@anchor{--trailing-error}
|
||||
@item -a
|
||||
|
@ -333,12 +333,13 @@ Quiet operation. Suppress all messages.
|
|||
@item -s @var{bytes}
|
||||
@itemx --dictionary-size=@var{bytes}
|
||||
When compressing, set the dictionary size limit in bytes. Clzip will use
|
||||
the smallest possible dictionary size for each file without exceeding
|
||||
this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. Values 12
|
||||
to 29 are interpreted as powers of two, meaning 2^12 to 2^29 bytes. Note
|
||||
that dictionary sizes are quantized. If the specified size does not
|
||||
match one of the valid sizes, it will be rounded upwards by adding up to
|
||||
@w{(@var{bytes} / 8)} to it.
|
||||
for each file the largest dictionary size that does not exceed neither
|
||||
the file size nor this limit. Valid values range from @w{4 KiB} to
|
||||
@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning
|
||||
2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
|
||||
coded in just one byte (@pxref{coded-dict-size}). If the specified size
|
||||
does not match one of the valid sizes, it will be rounded upwards by
|
||||
adding up to @w{(@var{bytes} / 8)} to it.
|
||||
|
||||
For maximum compression you should use a dictionary size limit as large
|
||||
as possible, but keep in mind that the decompression memory requirement
|
||||
|
@ -376,18 +377,23 @@ ASCII characters.@*
|
|||
Two or more @samp{-v} options show the progress of (de)compression.
|
||||
|
||||
@item -0 .. -9
|
||||
Set the compression parameters (dictionary size and match length limit)
|
||||
as shown in the table below. The default compression level is @samp{-6}.
|
||||
Note that @samp{-9} can be much slower than @samp{-0}. These options
|
||||
have no effect when decompressing, testing or listing.
|
||||
Compression level. Set the compression parameters (dictionary size and
|
||||
match length limit) as shown in the table below. The default compression
|
||||
level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that
|
||||
@samp{-9} can be much slower than @samp{-0}. These options have no
|
||||
effect when decompressing, testing or listing.
|
||||
|
||||
The bidimensional parameter space of LZMA can't be mapped to a linear
|
||||
scale optimal for all files. If your files are large, very repetitive,
|
||||
etc, you may need to use the @samp{--dictionary-size} and
|
||||
@samp{--match-length} options directly to achieve optimal performance.
|
||||
|
||||
@multitable {Level} {Dictionary size} {Match length limit}
|
||||
@item Level @tab Dictionary size @tab Match length limit
|
||||
If several compression levels or @samp{-s} or @samp{-m} options are
|
||||
given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is
|
||||
equivalent to @w{@samp{-s64MiB -m273}}
|
||||
|
||||
@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)}
|
||||
@item Level @tab Dictionary size (-s) @tab Match length limit (-m)
|
||||
@item -0 @tab 64 KiB @tab 16 bytes
|
||||
@item -1 @tab 1 MiB @tab 5 bytes
|
||||
@item -2 @tab 1.5 MiB @tab 6 bytes
|
||||
|
@ -446,10 +452,10 @@ is to make it so complicated that there are no obvious deficiencies. The
|
|||
first method is far more difficult.@*
|
||||
--- C.A.R. Hoare
|
||||
|
||||
Lzip has been designed, written and tested with great care to be the
|
||||
standard general-purpose compressor for unix-like systems. This chapter
|
||||
describes the lessons learned from previous compressors (gzip and
|
||||
bzip2), and their application to the design of lzip.
|
||||
Lzip has been designed, written and tested with great care to replace
|
||||
gzip and bzip2 as the standard general-purpose compressed format for
|
||||
unix-like systems. This chapter describes the lessons learned from
|
||||
these previous formats, and their application to the design of lzip.
|
||||
|
||||
@sp 1
|
||||
@section Format design
|
||||
|
@ -489,18 +495,21 @@ is extraordinarily safe. It provides embedded error detection. Any
|
|||
distance larger than the dictionary size acts as a forbidden symbol,
|
||||
allowing the decompressor to detect the approximate position of errors,
|
||||
and leaving very little work for the check sequence (CRC and data sizes)
|
||||
in the detection of errors. Lzip is usually able to detect all posible
|
||||
in the detection of errors. Lzip is usually able to detect all possible
|
||||
bit flips in the compressed data without resorting to the check
|
||||
sequence. It would be difficult to write an automatic recovery tool like
|
||||
lziprecover for the gzip format. And, as far as I know, it has never
|
||||
been written.
|
||||
|
||||
Lzip, like gzip and bzip2, uses a CRC32 to check the integrity of the
|
||||
decompressed data because it provides more accurate error detection than
|
||||
CRC64 up to a compressed size of about @w{16 GiB}, a size larger than
|
||||
that of most files. In the case of lzip, the additional detection
|
||||
decompressed data because it provides optimal accuracy in the detection
|
||||
of errors up to a compressed size of about @w{16 GiB}, a size larger
|
||||
than that of most files. In the case of lzip, the additional detection
|
||||
capability of the decompressor reduces the probability of undetected
|
||||
errors more than a million times beyond what the CRC32 alone provides.
|
||||
errors about four million times more, resulting in a combined integrity
|
||||
checking optimally accurate for any member size produced by lzip.
|
||||
Preliminary results suggest that the lzip format is safe enough to be
|
||||
used in critical safety avionics systems.
|
||||
|
||||
The lzip format is designed for long-term archiving. Therefore it
|
||||
excludes any unneeded features that may interfere with the future
|
||||
|
@ -559,7 +568,7 @@ size. The size of any file larger than @w{4 GiB} gets truncated.
|
|||
Bzip2 does not store the uncompressed size of the file.
|
||||
|
||||
The lzip format provides a 64-bit field for the uncompressed size.
|
||||
Additionaly, lzip produces multimember output automatically when the
|
||||
Additionally, lzip produces multimember output automatically when the
|
||||
size is too large for a single member, allowing for an unlimited
|
||||
uncompressed size.
|
||||
|
||||
|
@ -614,10 +623,10 @@ vulnerability or false negative.
|
|||
|
||||
@item Dictionary size
|
||||
|
||||
Lzip automatically uses the smallest possible dictionary size for each
|
||||
file. In addition to reducing the amount of memory required for
|
||||
decompression, this feature also minimizes the probability of being
|
||||
affected by RAM errors during compression.
|
||||
Lzip automatically adapts the dictionary size to the size of each file.
|
||||
In addition to reducing the amount of memory required for decompression,
|
||||
this feature also minimizes the probability of being affected by RAM
|
||||
errors during compression. @c key4_mask
|
||||
|
||||
@item Exit status
|
||||
|
||||
|
@ -674,12 +683,13 @@ A four byte string, identifying the lzip format, with the value "LZIP"
|
|||
@item VN (version number, 1 byte)
|
||||
Just in case something needs to be modified in the future. 1 for now.
|
||||
|
||||
@anchor{coded-dict-size}
|
||||
@item DS (coded dictionary size, 1 byte)
|
||||
The dictionary size is calculated by taking a power of 2 (the base size)
|
||||
and substracting from it a fraction between 0/16 and 7/16 of the base
|
||||
and subtracting from it a fraction between 0/16 and 7/16 of the base
|
||||
size.@*
|
||||
Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to substract
|
||||
Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
|
||||
from the base size to obtain the dictionary size.@*
|
||||
Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
|
||||
Valid values for dictionary size range from 4 KiB to 512 MiB.
|
||||
|
@ -939,7 +949,7 @@ are:
|
|||
@sp 1
|
||||
The contexts for decoding the type of coding sequence are:
|
||||
|
||||
@multitable @columnfractions .2 .4 .4
|
||||
@multitable @columnfractions .2 .35 .45
|
||||
@headitem Name @tab Indices @tab Used when
|
||||
@item bm_match @tab state, pos_state @tab sequence start
|
||||
@item bm_rep @tab state @tab after sequence 1
|
||||
|
@ -952,7 +962,7 @@ The contexts for decoding the type of coding sequence are:
|
|||
@sp 1
|
||||
The contexts for decoding distances are:
|
||||
|
||||
@multitable @columnfractions .2 .4 .4
|
||||
@multitable @columnfractions .2 .3 .5
|
||||
@headitem Name @tab Indices @tab Used when
|
||||
@item bm_dis_slot @tab len_state, bit tree @tab distance start
|
||||
@item bm_dis @tab reverse bit tree @tab after slots 4 to 13
|
||||
|
@ -1073,9 +1083,12 @@ where a file containing trailing data must be rejected, the option
|
|||
WARNING! Even if clzip is bug-free, other causes may result in a corrupt
|
||||
compressed file (bugs in the system libraries, memory errors, etc).
|
||||
Therefore, if the data you are going to compress are important, give the
|
||||
@samp{--keep} option to clzip and don't remove the original file until
|
||||
you verify the compressed file with a command like
|
||||
@w{@samp{clzip -cd file.lz | cmp file -}}.
|
||||
@samp{--keep} option to clzip and don't remove the original file until you
|
||||
verify the compressed file with a command like
|
||||
@w{@samp{clzip -cd file.lz | cmp file -}}. Most RAM errors happening during
|
||||
compression can only be detected by comparing the compressed file with the
|
||||
original because the corruption happens before clzip compresses the RAM
|
||||
contents, resulting in a valid compressed file containing wrong data.
|
||||
|
||||
@sp 1
|
||||
@noindent
|
||||
|
@ -1203,7 +1216,7 @@ find by running @w{@code{clzip --version}}.
|
|||
|
||||
@verbatim
|
||||
/* Lzd - Educational decompressor for the lzip format
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software. Redistribution and use in source and
|
||||
binary forms, with or without modification, are permitted provided
|
||||
|
@ -1233,7 +1246,7 @@ find by running @w{@code{clzip --version}}.
|
|||
#include <cstring>
|
||||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#include <fcntl.h>
|
||||
#include <io.h>
|
||||
#endif
|
||||
|
@ -1334,9 +1347,9 @@ public:
|
|||
const CRC32 crc32;
|
||||
|
||||
|
||||
typedef uint8_t File_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
|
||||
typedef uint8_t Lzip_header[6]; // 0-3 magic, 4 version, 5 coded_dict_size
|
||||
|
||||
typedef uint8_t File_trailer[20];
|
||||
typedef uint8_t Lzip_trailer[20];
|
||||
// 0-3 CRC32 of the uncompressed data
|
||||
// 4-11 size of the uncompressed data
|
||||
// 12-19 member size including header and trailer
|
||||
|
@ -1530,6 +1543,7 @@ bool LZ_decoder::decode_member() // Returns false if error
|
|||
const int pos_state = data_position() & pos_state_mask;
|
||||
if( rdec.decode_bit( bm_match[state()][pos_state] ) == 0 ) // 1st bit
|
||||
{
|
||||
// literal byte
|
||||
const uint8_t prev_byte = peek( 0 );
|
||||
const int literal_state = prev_byte >> ( 8 - literal_context_bits );
|
||||
Bit_model * const bm = bm_literal[literal_state];
|
||||
|
@ -1538,67 +1552,66 @@ bool LZ_decoder::decode_member() // Returns false if error
|
|||
else
|
||||
put_byte( rdec.decode_matched( bm, peek( rep0 ) ) );
|
||||
state.set_char();
|
||||
continue;
|
||||
}
|
||||
else // match or repeated match
|
||||
// match or repeated match
|
||||
int len;
|
||||
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
|
||||
{
|
||||
int len;
|
||||
if( rdec.decode_bit( bm_rep[state()] ) != 0 ) // 2nd bit
|
||||
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
|
||||
{
|
||||
if( rdec.decode_bit( bm_rep0[state()] ) == 0 ) // 3rd bit
|
||||
{
|
||||
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
|
||||
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
|
||||
}
|
||||
if( rdec.decode_bit( bm_len[state()][pos_state] ) == 0 ) // 4th bit
|
||||
{ state.set_short_rep(); put_byte( peek( rep0 ) ); continue; }
|
||||
}
|
||||
else
|
||||
{
|
||||
unsigned distance;
|
||||
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
|
||||
distance = rep1;
|
||||
else
|
||||
{
|
||||
unsigned distance;
|
||||
if( rdec.decode_bit( bm_rep1[state()] ) == 0 ) // 4th bit
|
||||
distance = rep1;
|
||||
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
|
||||
distance = rep2;
|
||||
else
|
||||
{
|
||||
if( rdec.decode_bit( bm_rep2[state()] ) == 0 ) // 5th bit
|
||||
distance = rep2;
|
||||
else
|
||||
{ distance = rep3; rep3 = rep2; }
|
||||
rep2 = rep1;
|
||||
}
|
||||
rep1 = rep0;
|
||||
rep0 = distance;
|
||||
{ distance = rep3; rep3 = rep2; }
|
||||
rep2 = rep1;
|
||||
}
|
||||
state.set_rep();
|
||||
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
|
||||
rep1 = rep0;
|
||||
rep0 = distance;
|
||||
}
|
||||
else // match
|
||||
{
|
||||
rep3 = rep2; rep2 = rep1; rep1 = rep0;
|
||||
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
|
||||
const int len_state = std::min( len - min_match_len, len_states - 1 );
|
||||
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
|
||||
if( rep0 >= start_dis_model )
|
||||
{
|
||||
const unsigned dis_slot = rep0;
|
||||
const int direct_bits = ( dis_slot >> 1 ) - 1;
|
||||
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
|
||||
if( dis_slot < end_dis_model )
|
||||
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
|
||||
direct_bits );
|
||||
else
|
||||
{
|
||||
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
|
||||
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
|
||||
if( rep0 == 0xFFFFFFFFU ) // marker found
|
||||
{
|
||||
flush_data();
|
||||
return ( len == min_match_len ); // End Of Stream marker
|
||||
}
|
||||
}
|
||||
}
|
||||
state.set_match();
|
||||
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
|
||||
{ flush_data(); return false; }
|
||||
}
|
||||
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
|
||||
state.set_rep();
|
||||
len = min_match_len + rdec.decode_len( rep_len_model, pos_state );
|
||||
}
|
||||
else // match
|
||||
{
|
||||
rep3 = rep2; rep2 = rep1; rep1 = rep0;
|
||||
len = min_match_len + rdec.decode_len( match_len_model, pos_state );
|
||||
const int len_state = std::min( len - min_match_len, len_states - 1 );
|
||||
rep0 = rdec.decode_tree( bm_dis_slot[len_state], dis_slot_bits );
|
||||
if( rep0 >= start_dis_model )
|
||||
{
|
||||
const unsigned dis_slot = rep0;
|
||||
const int direct_bits = ( dis_slot >> 1 ) - 1;
|
||||
rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits;
|
||||
if( dis_slot < end_dis_model )
|
||||
rep0 += rdec.decode_tree_reversed( bm_dis + ( rep0 - dis_slot ),
|
||||
direct_bits );
|
||||
else
|
||||
{
|
||||
rep0 += rdec.decode( direct_bits - dis_align_bits ) << dis_align_bits;
|
||||
rep0 += rdec.decode_tree_reversed( bm_align, dis_align_bits );
|
||||
if( rep0 == 0xFFFFFFFFU ) // marker found
|
||||
{
|
||||
flush_data();
|
||||
return ( len == min_match_len ); // End Of Stream marker
|
||||
}
|
||||
}
|
||||
}
|
||||
state.set_match();
|
||||
if( rep0 >= dictionary_size || ( rep0 >= pos && !pos_wrapped ) )
|
||||
{ flush_data(); return false; }
|
||||
}
|
||||
for( int i = 0; i < len; ++i ) put_byte( peek( rep0 ) );
|
||||
}
|
||||
flush_data();
|
||||
return false;
|
||||
|
@ -1616,7 +1629,7 @@ int main( const int argc, const char * const argv[] )
|
|||
"It is not safe to use lzd for any real work.\n"
|
||||
"\nUsage: %s < file.lz > file\n", argv[0] );
|
||||
std::printf( "Lzd decompresses from standard input to standard output.\n"
|
||||
"\nCopyright (C) 2018 Antonio Diaz Diaz.\n"
|
||||
"\nCopyright (C) 2019 Antonio Diaz Diaz.\n"
|
||||
"This is free software: you are free to change and redistribute it.\n"
|
||||
"There is NO WARRANTY, to the extent permitted by law.\n"
|
||||
"Report bugs to lzip-bug@nongnu.org\n"
|
||||
|
@ -1624,14 +1637,14 @@ int main( const int argc, const char * const argv[] )
|
|||
return 0;
|
||||
}
|
||||
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(_MSC_VER)
|
||||
setmode( fileno( stdin ), O_BINARY );
|
||||
setmode( fileno( stdout ), O_BINARY );
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
setmode( STDIN_FILENO, O_BINARY );
|
||||
setmode( STDOUT_FILENO, O_BINARY );
|
||||
#endif
|
||||
|
||||
for( bool first_member = true; ; first_member = false )
|
||||
{
|
||||
File_header header; // verify header
|
||||
Lzip_header header; // verify header
|
||||
for( int i = 0; i < 6; ++i ) header[i] = std::getc( stdin );
|
||||
if( std::feof( stdin ) || std::memcmp( header, "LZIP\x01", 5 ) != 0 )
|
||||
{
|
||||
|
@ -1650,7 +1663,7 @@ int main( const int argc, const char * const argv[] )
|
|||
if( !decoder.decode_member() )
|
||||
{ std::fputs( "Data error\n", stderr ); return 2; }
|
||||
|
||||
File_trailer trailer; // verify trailer
|
||||
Lzip_trailer trailer; // verify trailer
|
||||
for( int i = 0; i < 20; ++i ) trailer[i] = std::getc( stdin );
|
||||
unsigned crc = 0;
|
||||
for( int i = 3; i >= 0; --i ) { crc <<= 8; crc += trailer[i]; }
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -325,7 +325,7 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e,
|
|||
--prev_index;
|
||||
else /* prev_index2 >= 0 */
|
||||
prev_index = prev_index2;
|
||||
cur_state = 8; /* St_set_char_rep(); */
|
||||
cur_state = St_set_char_rep();
|
||||
}
|
||||
cur_trial->state = cur_state;
|
||||
for( i = 0; i < num_rep_distances; ++i )
|
||||
|
@ -496,7 +496,7 @@ bool LZe_encode_member( struct LZ_encoder * const e,
|
|||
const unsigned long long member_size )
|
||||
{
|
||||
const unsigned long long member_size_limit =
|
||||
member_size - Ft_size - max_marker_size;
|
||||
member_size - Lt_size - max_marker_size;
|
||||
const bool best = ( e->match_len_limit > 12 );
|
||||
const int dis_price_count = best ? 1 : 512;
|
||||
const int align_price_count = best ? 1 : dis_align_size;
|
||||
|
@ -510,7 +510,7 @@ bool LZe_encode_member( struct LZ_encoder * const e,
|
|||
for( i = 0; i < num_rep_distances; ++i ) reps[i] = 0;
|
||||
|
||||
if( Mb_data_position( &e->eb.mb ) != 0 ||
|
||||
Re_member_position( &e->eb.renc ) != Fh_size )
|
||||
Re_member_position( &e->eb.renc ) != Lh_size )
|
||||
return false; /* can be called only once */
|
||||
|
||||
if( !Mb_data_finished( &e->eb.mb ) ) /* encode first byte */
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -54,7 +54,8 @@ void Mb_normalize_pos( struct Matchfinder_base * const mb )
|
|||
if( !mb->at_stream_end )
|
||||
{
|
||||
int i;
|
||||
const int offset = mb->pos - mb->before_size - mb->dictionary_size;
|
||||
/* offset is int32_t for the min below */
|
||||
const int32_t offset = mb->pos - mb->before_size - mb->dictionary_size;
|
||||
const int size = mb->stream_pos - offset;
|
||||
memmove( mb->buffer, mb->buffer + offset, size );
|
||||
mb->partial_data_pos += offset;
|
||||
|
@ -110,7 +111,7 @@ bool Mb_init( struct Matchfinder_base * const mb, const int before_size,
|
|||
size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 );
|
||||
if( mb->dictionary_size > 1 << 26 ) /* 64 MiB */
|
||||
size >>= 1;
|
||||
mb->key4_mask = size - 1;
|
||||
mb->key4_mask = size - 1; /* increases with dictionary size */
|
||||
size += num_prev_positions23;
|
||||
mb->num_prev_positions = size;
|
||||
|
||||
|
@ -171,15 +172,15 @@ void LZeb_full_flush( struct LZ_encoder_base * const eb, const State state )
|
|||
{
|
||||
int i;
|
||||
const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask;
|
||||
File_trailer trailer;
|
||||
Lzip_trailer trailer;
|
||||
Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 );
|
||||
Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 );
|
||||
LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len, pos_state );
|
||||
Re_flush( &eb->renc );
|
||||
Ft_set_data_crc( trailer, LZeb_crc( eb ) );
|
||||
Ft_set_data_size( trailer, Mb_data_position( &eb->mb ) );
|
||||
Ft_set_member_size( trailer, Re_member_position( &eb->renc ) + Ft_size );
|
||||
for( i = 0; i < Ft_size; ++i )
|
||||
Lt_set_data_crc( trailer, LZeb_crc( eb ) );
|
||||
Lt_set_data_size( trailer, Mb_data_position( &eb->mb ) );
|
||||
Lt_set_member_size( trailer, Re_member_position( &eb->renc ) + Lt_size );
|
||||
for( i = 0; i < Lt_size; ++i )
|
||||
Re_put_byte( &eb->renc, trailer[i] );
|
||||
Re_flush_data( &eb->renc );
|
||||
}
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -237,7 +237,7 @@ struct Range_encoder
|
|||
unsigned ff_count;
|
||||
int outfd; /* output file descriptor */
|
||||
uint8_t cache;
|
||||
File_header header;
|
||||
Lzip_header header;
|
||||
};
|
||||
|
||||
void Re_flush_data( struct Range_encoder * const renc );
|
||||
|
@ -273,8 +273,8 @@ static inline void Re_reset( struct Range_encoder * const renc,
|
|||
renc->range = 0xFFFFFFFFU;
|
||||
renc->ff_count = 0;
|
||||
renc->cache = 0;
|
||||
Fh_set_dictionary_size( renc->header, dictionary_size );
|
||||
for( i = 0; i < Fh_size; ++i )
|
||||
Lh_set_dictionary_size( renc->header, dictionary_size );
|
||||
for( i = 0; i < Lh_size; ++i )
|
||||
Re_put_byte( renc, renc->header[i] );
|
||||
}
|
||||
|
||||
|
@ -284,7 +284,7 @@ static inline bool Re_init( struct Range_encoder * const renc,
|
|||
renc->buffer = (uint8_t *)malloc( re_buffer_size );
|
||||
if( !renc->buffer ) return false;
|
||||
renc->outfd = ofd;
|
||||
Fh_set_magic( renc->header );
|
||||
Lh_set_magic( renc->header );
|
||||
Re_reset( renc, dictionary_size );
|
||||
return true;
|
||||
}
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -74,14 +74,14 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe,
|
|||
const unsigned long long member_size )
|
||||
{
|
||||
const unsigned long long member_size_limit =
|
||||
member_size - Ft_size - max_marker_size;
|
||||
member_size - Lt_size - max_marker_size;
|
||||
int rep = 0, i;
|
||||
int reps[num_rep_distances];
|
||||
State state = 0;
|
||||
for( i = 0; i < num_rep_distances; ++i ) reps[i] = 0;
|
||||
|
||||
if( Mb_data_position( &fe->eb.mb ) != 0 ||
|
||||
Re_member_position( &fe->eb.renc ) != Fh_size )
|
||||
Re_member_position( &fe->eb.renc ) != Lh_size )
|
||||
return false; /* can be called only once */
|
||||
|
||||
if( !Mb_data_finished( &fe->eb.mb ) ) /* encode first byte */
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -33,16 +33,16 @@ int FLZe_longest_match_len( struct FLZ_encoder * const fe, int * const distance
|
|||
|
||||
static inline void FLZe_update_and_move( struct FLZ_encoder * const fe, int n )
|
||||
{
|
||||
struct Matchfinder_base * const mb = &fe->eb.mb;
|
||||
while( --n >= 0 )
|
||||
{
|
||||
if( Mb_available_bytes( &fe->eb.mb ) >= 4 )
|
||||
if( Mb_available_bytes( mb ) >= 4 )
|
||||
{
|
||||
fe->key4 = ( ( fe->key4 << 4 ) ^ fe->eb.mb.buffer[fe->eb.mb.pos+3] ) &
|
||||
fe->eb.mb.key4_mask;
|
||||
fe->eb.mb.pos_array[fe->eb.mb.cyclic_pos] = fe->eb.mb.prev_positions[fe->key4];
|
||||
fe->eb.mb.prev_positions[fe->key4] = fe->eb.mb.pos + 1;
|
||||
fe->key4 = ( ( fe->key4 << 4 ) ^ mb->buffer[mb->pos+3] ) & mb->key4_mask;
|
||||
mb->pos_array[mb->cyclic_pos] = mb->prev_positions[fe->key4];
|
||||
mb->prev_positions[fe->key4] = mb->pos + 1;
|
||||
}
|
||||
Mb_move_pos( &fe->eb.mb );
|
||||
Mb_move_pos( mb );
|
||||
}
|
||||
}
|
||||
|
||||
|
|
272
file_index.c
272
file_index.c
|
@ -1,272 +0,0 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
||||
#include <errno.h>
|
||||
#include <stdbool.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <stdint.h>
|
||||
#include <stdlib.h>
|
||||
#include <unistd.h>
|
||||
|
||||
#include "lzip.h"
|
||||
#include "file_index.h"
|
||||
|
||||
|
||||
static int seek_read( const int fd, uint8_t * const buf, const int size,
|
||||
const long long pos )
|
||||
{
|
||||
if( lseek( fd, pos, SEEK_SET ) == pos )
|
||||
return readblock( fd, buf, size );
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
static bool add_error( struct File_index * const fi, const char * const msg )
|
||||
{
|
||||
const int len = strlen( msg );
|
||||
void * tmp = resize_buffer( fi->error, fi->error_size + len + 1 );
|
||||
if( !tmp ) return false;
|
||||
fi->error = (char *)tmp;
|
||||
strncpy( fi->error + fi->error_size, msg, len + 1 );
|
||||
fi->error_size += len;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static bool push_back_member( struct File_index * const fi,
|
||||
const long long dp, const long long ds,
|
||||
const long long mp, const long long ms,
|
||||
const unsigned dict_size )
|
||||
{
|
||||
struct Member * p;
|
||||
void * tmp = resize_buffer( fi->member_vector,
|
||||
( fi->members + 1 ) * sizeof fi->member_vector[0] );
|
||||
if( !tmp )
|
||||
{ add_error( fi, "Not enough memory." ); fi->retval = 1; return false; }
|
||||
fi->member_vector = (struct Member *)tmp;
|
||||
p = &(fi->member_vector[fi->members]);
|
||||
init_member( p, dp, ds, mp, ms, dict_size );
|
||||
++fi->members;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static void Fi_free_member_vector( struct File_index * const fi )
|
||||
{
|
||||
if( fi->member_vector )
|
||||
{ free( fi->member_vector ); fi->member_vector = 0; }
|
||||
fi->members = 0;
|
||||
}
|
||||
|
||||
|
||||
static void Fi_reverse_member_vector( struct File_index * const fi )
|
||||
{
|
||||
struct Member tmp;
|
||||
long i;
|
||||
for( i = 0; i < fi->members / 2; ++i )
|
||||
{
|
||||
tmp = fi->member_vector[i];
|
||||
fi->member_vector[i] = fi->member_vector[fi->members-i-1];
|
||||
fi->member_vector[fi->members-i-1] = tmp;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
static void Fi_set_errno_error( struct File_index * const fi,
|
||||
const char * const msg )
|
||||
{
|
||||
add_error( fi, msg ); add_error( fi, strerror( errno ) );
|
||||
fi->retval = 1;
|
||||
}
|
||||
|
||||
static void Fi_set_num_error( struct File_index * const fi,
|
||||
const char * const msg, unsigned long long num )
|
||||
{
|
||||
char buf[80];
|
||||
snprintf( buf, sizeof buf, "%s%llu", msg, num );
|
||||
add_error( fi, buf );
|
||||
fi->retval = 2;
|
||||
}
|
||||
|
||||
|
||||
/* If successful, push last member and set pos to member header. */
|
||||
static bool Fi_skip_trailing_data( struct File_index * const fi,
|
||||
const int fd, long long * const pos,
|
||||
const bool ignore_trailing,
|
||||
const bool loose_trailing )
|
||||
{
|
||||
enum { block_size = 16384,
|
||||
buffer_size = block_size + Ft_size - 1 + Fh_size };
|
||||
uint8_t buffer[buffer_size];
|
||||
int bsize = *pos % block_size; /* total bytes in buffer */
|
||||
int search_size, rd_size;
|
||||
unsigned long long ipos;
|
||||
int i;
|
||||
if( bsize <= buffer_size - block_size ) bsize += block_size;
|
||||
search_size = bsize; /* bytes to search for trailer */
|
||||
rd_size = bsize; /* bytes to read from file */
|
||||
ipos = *pos - rd_size; /* aligned to block_size */
|
||||
if( *pos < min_member_size ) return false;
|
||||
|
||||
while( true )
|
||||
{
|
||||
const uint8_t max_msb = ( ipos + search_size ) >> 56;
|
||||
if( seek_read( fd, buffer, rd_size, ipos ) != rd_size )
|
||||
{ Fi_set_errno_error( fi, "Error seeking member trailer: " );
|
||||
return false; }
|
||||
for( i = search_size; i >= Ft_size; --i )
|
||||
if( buffer[i-1] <= max_msb ) /* most significant byte of member_size */
|
||||
{
|
||||
File_header header;
|
||||
File_trailer * trailer = (File_trailer *)( buffer + i - Ft_size );
|
||||
const unsigned long long member_size = Ft_get_member_size( *trailer );
|
||||
unsigned dictionary_size;
|
||||
if( member_size == 0 )
|
||||
{ while( i > Ft_size && buffer[i-9] == 0 ) --i; continue; }
|
||||
if( member_size < min_member_size || member_size > ipos + i )
|
||||
continue;
|
||||
if( seek_read( fd, header, Fh_size,
|
||||
ipos + i - member_size ) != Fh_size )
|
||||
{ Fi_set_errno_error( fi, "Error reading member header: " );
|
||||
return false; }
|
||||
dictionary_size = Fh_get_dictionary_size( header );
|
||||
if( !Fh_verify_magic( header ) || !Fh_verify_version( header ) ||
|
||||
!isvalid_ds( dictionary_size ) ) continue;
|
||||
if( Fh_verify_prefix( buffer + i, bsize - i ) )
|
||||
{
|
||||
add_error( fi, "Last member in input file is truncated or corrupt." );
|
||||
fi->retval = 2; return false;
|
||||
}
|
||||
if( !loose_trailing && bsize - i >= Fh_size &&
|
||||
Fh_verify_corrupt( buffer + i ) )
|
||||
{ add_error( fi, corrupt_mm_msg ); fi->retval = 2; return false; }
|
||||
if( !ignore_trailing )
|
||||
{ add_error( fi, trailing_msg ); fi->retval = 2; return false; }
|
||||
*pos = ipos + i - member_size;
|
||||
return push_back_member( fi, 0, Ft_get_data_size( *trailer ), *pos,
|
||||
member_size, dictionary_size );
|
||||
}
|
||||
if( ipos <= 0 )
|
||||
{ Fi_set_num_error( fi, "Member size in trailer is corrupt at pos ",
|
||||
*pos - 8 );
|
||||
return false; }
|
||||
bsize = buffer_size;
|
||||
search_size = bsize - Fh_size;
|
||||
rd_size = block_size;
|
||||
ipos -= rd_size;
|
||||
memcpy( buffer + rd_size, buffer, buffer_size - rd_size );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
bool Fi_init( struct File_index * const fi, const int infd,
|
||||
const bool ignore_trailing, const bool loose_trailing )
|
||||
{
|
||||
File_header header;
|
||||
long long pos;
|
||||
long i;
|
||||
fi->member_vector = 0;
|
||||
fi->error = 0;
|
||||
fi->isize = lseek( infd, 0, SEEK_END );
|
||||
fi->members = 0;
|
||||
fi->error_size = 0;
|
||||
fi->retval = 0;
|
||||
if( fi->isize < 0 )
|
||||
{ Fi_set_errno_error( fi, "Input file is not seekable: " ); return false; }
|
||||
if( fi->isize < min_member_size )
|
||||
{ add_error( fi, "Input file is too short." ); fi->retval = 2;
|
||||
return false; }
|
||||
if( fi->isize > INT64_MAX )
|
||||
{ add_error( fi, "Input file is too long (2^63 bytes or more)." );
|
||||
fi->retval = 2; return false; }
|
||||
|
||||
if( seek_read( infd, header, Fh_size, 0 ) != Fh_size )
|
||||
{ Fi_set_errno_error( fi, "Error reading member header: " ); return false; }
|
||||
if( !Fh_verify_magic( header ) )
|
||||
{ add_error( fi, bad_magic_msg ); fi->retval = 2; return false; }
|
||||
if( !Fh_verify_version( header ) )
|
||||
{ add_error( fi, bad_version( Fh_version( header ) ) ); fi->retval = 2;
|
||||
return false; }
|
||||
if( !isvalid_ds( Fh_get_dictionary_size( header ) ) )
|
||||
{ add_error( fi, bad_dict_msg ); fi->retval = 2; return false; }
|
||||
|
||||
pos = fi->isize; /* always points to a header or to EOF */
|
||||
while( pos >= min_member_size )
|
||||
{
|
||||
File_trailer trailer;
|
||||
unsigned long long member_size;
|
||||
unsigned dictionary_size;
|
||||
if( seek_read( infd, trailer, Ft_size, pos - Ft_size ) != Ft_size )
|
||||
{ Fi_set_errno_error( fi, "Error reading member trailer: " ); break; }
|
||||
member_size = Ft_get_member_size( trailer );
|
||||
if( member_size < min_member_size || member_size > (unsigned long long)pos )
|
||||
{
|
||||
if( fi->members <= 0 )
|
||||
{ if( Fi_skip_trailing_data( fi, infd, &pos, ignore_trailing,
|
||||
loose_trailing ) ) continue; else return false; }
|
||||
Fi_set_num_error( fi, "Member size in trailer is corrupt at pos ", pos - 8 );
|
||||
break;
|
||||
}
|
||||
if( seek_read( infd, header, Fh_size, pos - member_size ) != Fh_size )
|
||||
{ Fi_set_errno_error( fi, "Error reading member header: " ); break; }
|
||||
dictionary_size = Fh_get_dictionary_size( header );
|
||||
if( !Fh_verify_magic( header ) || !Fh_verify_version( header ) ||
|
||||
!isvalid_ds( dictionary_size ) )
|
||||
{
|
||||
if( fi->members <= 0 )
|
||||
{ if( Fi_skip_trailing_data( fi, infd, &pos, ignore_trailing,
|
||||
loose_trailing ) ) continue; else return false; }
|
||||
Fi_set_num_error( fi, "Bad header at pos ", pos - member_size );
|
||||
break;
|
||||
}
|
||||
pos -= member_size;
|
||||
if( !push_back_member( fi, 0, Ft_get_data_size( trailer ), pos,
|
||||
member_size, dictionary_size ) )
|
||||
return false;
|
||||
}
|
||||
if( pos != 0 || fi->members <= 0 )
|
||||
{
|
||||
Fi_free_member_vector( fi );
|
||||
if( fi->retval == 0 )
|
||||
{ add_error( fi, "Can't create file index." ); fi->retval = 2; }
|
||||
return false;
|
||||
}
|
||||
Fi_reverse_member_vector( fi );
|
||||
for( i = 0; i < fi->members - 1; ++i )
|
||||
{
|
||||
const long long end = block_end( fi->member_vector[i].dblock );
|
||||
if( end < 0 || end > INT64_MAX )
|
||||
{
|
||||
Fi_free_member_vector( fi );
|
||||
add_error( fi, "Data in input file is too long (2^63 bytes or more)." );
|
||||
fi->retval = 2; return false;
|
||||
}
|
||||
fi->member_vector[i+1].dblock.pos = end;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
void Fi_free( struct File_index * const fi )
|
||||
{
|
||||
Fi_free_member_vector( fi );
|
||||
if( fi->error ) { free( fi->error ); fi->error = 0; }
|
||||
fi->error_size = 0;
|
||||
}
|
38
list.c
38
list.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -25,7 +25,7 @@
|
|||
#include <sys/stat.h>
|
||||
|
||||
#include "lzip.h"
|
||||
#include "file_index.h"
|
||||
#include "lzip_index.h"
|
||||
|
||||
|
||||
static void list_line( const unsigned long long uncomp_size,
|
||||
|
@ -53,7 +53,7 @@ int list_files( const char * const filenames[], const int num_filenames,
|
|||
for( i = 0; i < num_filenames; ++i )
|
||||
{
|
||||
const char * input_filename;
|
||||
struct File_index file_index;
|
||||
struct Lzip_index lzip_index;
|
||||
struct stat in_stats; /* not used */
|
||||
int infd;
|
||||
const bool from_stdin = ( strcmp( filenames[i], "-" ) == 0 );
|
||||
|
@ -63,18 +63,18 @@ int list_files( const char * const filenames[], const int num_filenames,
|
|||
open_instream( input_filename, &in_stats, true, true );
|
||||
if( infd < 0 ) { if( retval < 1 ) retval = 1; continue; }
|
||||
|
||||
Fi_init( &file_index, infd, ignore_trailing, loose_trailing );
|
||||
Li_init( &lzip_index, infd, ignore_trailing, loose_trailing );
|
||||
close( infd );
|
||||
if( file_index.retval != 0 )
|
||||
if( lzip_index.retval != 0 )
|
||||
{
|
||||
show_file_error( input_filename, file_index.error, 0 );
|
||||
if( retval < file_index.retval ) retval = file_index.retval;
|
||||
Fi_free( &file_index ); continue;
|
||||
show_file_error( input_filename, lzip_index.error, 0 );
|
||||
if( retval < lzip_index.retval ) retval = lzip_index.retval;
|
||||
Li_free( &lzip_index ); continue;
|
||||
}
|
||||
if( verbosity >= 0 )
|
||||
{
|
||||
const unsigned long long udata_size = Fi_udata_size( &file_index );
|
||||
const unsigned long long cdata_size = Fi_cdata_size( &file_index );
|
||||
const unsigned long long udata_size = Li_udata_size( &lzip_index );
|
||||
const unsigned long long cdata_size = Li_cdata_size( &lzip_index );
|
||||
total_comp += cdata_size; total_uncomp += udata_size; ++files;
|
||||
if( first_post )
|
||||
{
|
||||
|
@ -87,23 +87,23 @@ int list_files( const char * const filenames[], const int num_filenames,
|
|||
long long trailing_size;
|
||||
unsigned dictionary_size = 0;
|
||||
long i;
|
||||
for( i = 0; i < file_index.members; ++i )
|
||||
for( i = 0; i < lzip_index.members; ++i )
|
||||
dictionary_size =
|
||||
max( dictionary_size, Fi_dictionary_size( &file_index, i ) );
|
||||
trailing_size = Fi_file_size( &file_index ) - cdata_size;
|
||||
max( dictionary_size, Li_dictionary_size( &lzip_index, i ) );
|
||||
trailing_size = Li_file_size( &lzip_index ) - cdata_size;
|
||||
printf( "%s %5ld %6lld ", format_ds( dictionary_size ),
|
||||
file_index.members, trailing_size );
|
||||
lzip_index.members, trailing_size );
|
||||
}
|
||||
list_line( udata_size, cdata_size, input_filename );
|
||||
|
||||
if( verbosity >= 2 && file_index.members > 1 )
|
||||
if( verbosity >= 2 && lzip_index.members > 1 )
|
||||
{
|
||||
long i;
|
||||
fputs( " member data_pos data_size member_pos member_size\n", stdout );
|
||||
for( i = 0; i < file_index.members; ++i )
|
||||
for( i = 0; i < lzip_index.members; ++i )
|
||||
{
|
||||
const struct Block * db = Fi_dblock( &file_index, i );
|
||||
const struct Block * mb = Fi_mblock( &file_index, i );
|
||||
const struct Block * db = Li_dblock( &lzip_index, i );
|
||||
const struct Block * mb = Li_mblock( &lzip_index, i );
|
||||
printf( "%5ld %15llu %15llu %15llu %15llu\n",
|
||||
i + 1, db->pos, db->size, mb->pos, mb->size );
|
||||
}
|
||||
|
@ -111,7 +111,7 @@ int list_files( const char * const filenames[], const int num_filenames,
|
|||
}
|
||||
fflush( stdout );
|
||||
}
|
||||
Fi_free( &file_index );
|
||||
Li_free( &lzip_index );
|
||||
}
|
||||
if( verbosity >= 0 && files > 1 )
|
||||
{
|
||||
|
|
70
lzip.h
70
lzip.h
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -36,6 +36,8 @@ static inline State St_set_char( const State st )
|
|||
return next[st];
|
||||
}
|
||||
|
||||
static inline State St_set_char_rep() { return 8; }
|
||||
|
||||
static inline State St_set_match( const State st )
|
||||
{ return ( ( st < 7 ) ? 7 : 10 ); }
|
||||
|
||||
|
@ -119,7 +121,7 @@ static inline void Lm_init( struct Len_model * const lm )
|
|||
/* defined in main.c */
|
||||
extern int verbosity;
|
||||
|
||||
struct Pretty_print
|
||||
struct Pretty_print /* requires global var 'int verbosity' */
|
||||
{
|
||||
const char * name;
|
||||
char * padded_name;
|
||||
|
@ -146,7 +148,7 @@ static inline void Pp_init( struct Pretty_print * const pp,
|
|||
{
|
||||
const char * const s = filenames[i];
|
||||
const unsigned len = (strcmp( s, "-" ) == 0) ? stdin_name_len : strlen( s );
|
||||
if( len > pp->longest_name ) pp->longest_name = len;
|
||||
if( pp->longest_name < len ) pp->longest_name = len;
|
||||
}
|
||||
if( pp->longest_name == 0 ) pp->longest_name = stdin_name_len;
|
||||
}
|
||||
|
@ -220,43 +222,43 @@ static inline int real_bits( unsigned value )
|
|||
}
|
||||
|
||||
|
||||
static const uint8_t magic_string[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */
|
||||
static const uint8_t lzip_magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */
|
||||
|
||||
typedef uint8_t File_header[6]; /* 0-3 magic bytes */
|
||||
typedef uint8_t Lzip_header[6]; /* 0-3 magic bytes */
|
||||
/* 4 version */
|
||||
/* 5 coded_dict_size */
|
||||
enum { Fh_size = 6 };
|
||||
enum { Lh_size = 6 };
|
||||
|
||||
static inline void Fh_set_magic( File_header data )
|
||||
{ memcpy( data, magic_string, 4 ); data[4] = 1; }
|
||||
static inline void Lh_set_magic( Lzip_header data )
|
||||
{ memcpy( data, lzip_magic, 4 ); data[4] = 1; }
|
||||
|
||||
static inline bool Fh_verify_magic( const File_header data )
|
||||
{ return ( memcmp( data, magic_string, 4 ) == 0 ); }
|
||||
static inline bool Lh_verify_magic( const Lzip_header data )
|
||||
{ return ( memcmp( data, lzip_magic, 4 ) == 0 ); }
|
||||
|
||||
/* detect (truncated) header */
|
||||
static inline bool Fh_verify_prefix( const File_header data, const int sz )
|
||||
static inline bool Lh_verify_prefix( const Lzip_header data, const int sz )
|
||||
{
|
||||
int i; for( i = 0; i < sz && i < 4; ++i )
|
||||
if( data[i] != magic_string[i] ) return false;
|
||||
if( data[i] != lzip_magic[i] ) return false;
|
||||
return ( sz > 0 );
|
||||
}
|
||||
|
||||
/* detect corrupt header */
|
||||
static inline bool Fh_verify_corrupt( const File_header data )
|
||||
static inline bool Lh_verify_corrupt( const Lzip_header data )
|
||||
{
|
||||
int matches = 0;
|
||||
int i; for( i = 0; i < 4; ++i )
|
||||
if( data[i] == magic_string[i] ) ++matches;
|
||||
if( data[i] == lzip_magic[i] ) ++matches;
|
||||
return ( matches > 1 && matches < 4 );
|
||||
}
|
||||
|
||||
static inline uint8_t Fh_version( const File_header data )
|
||||
static inline uint8_t Lh_version( const Lzip_header data )
|
||||
{ return data[4]; }
|
||||
|
||||
static inline bool Fh_verify_version( const File_header data )
|
||||
static inline bool Lh_verify_version( const Lzip_header data )
|
||||
{ return ( data[4] == 1 ); }
|
||||
|
||||
static inline unsigned Fh_get_dictionary_size( const File_header data )
|
||||
static inline unsigned Lh_get_dictionary_size( const Lzip_header data )
|
||||
{
|
||||
unsigned sz = ( 1 << ( data[5] & 0x1F ) );
|
||||
if( sz > min_dictionary_size )
|
||||
|
@ -264,7 +266,7 @@ static inline unsigned Fh_get_dictionary_size( const File_header data )
|
|||
return sz;
|
||||
}
|
||||
|
||||
static inline bool Fh_set_dictionary_size( File_header data, const unsigned sz )
|
||||
static inline bool Lh_set_dictionary_size( Lzip_header data, const unsigned sz )
|
||||
{
|
||||
if( !isvalid_ds( sz ) ) return false;
|
||||
data[5] = real_bits( sz - 1 );
|
||||
|
@ -281,43 +283,57 @@ static inline bool Fh_set_dictionary_size( File_header data, const unsigned sz )
|
|||
}
|
||||
|
||||
|
||||
typedef uint8_t File_trailer[20];
|
||||
typedef uint8_t Lzip_trailer[20];
|
||||
/* 0-3 CRC32 of the uncompressed data */
|
||||
/* 4-11 size of the uncompressed data */
|
||||
/* 12-19 member size including header and trailer */
|
||||
enum { Lt_size = 20 };
|
||||
|
||||
enum { Ft_size = 20 };
|
||||
|
||||
static inline unsigned Ft_get_data_crc( const File_trailer data )
|
||||
static inline unsigned Lt_get_data_crc( const Lzip_trailer data )
|
||||
{
|
||||
unsigned tmp = 0;
|
||||
int i; for( i = 3; i >= 0; --i ) { tmp <<= 8; tmp += data[i]; }
|
||||
return tmp;
|
||||
}
|
||||
|
||||
static inline void Ft_set_data_crc( File_trailer data, unsigned crc )
|
||||
static inline void Lt_set_data_crc( Lzip_trailer data, unsigned crc )
|
||||
{ int i; for( i = 0; i <= 3; ++i ) { data[i] = (uint8_t)crc; crc >>= 8; } }
|
||||
|
||||
static inline unsigned long long Ft_get_data_size( const File_trailer data )
|
||||
static inline unsigned long long Lt_get_data_size( const Lzip_trailer data )
|
||||
{
|
||||
unsigned long long tmp = 0;
|
||||
int i; for( i = 11; i >= 4; --i ) { tmp <<= 8; tmp += data[i]; }
|
||||
return tmp;
|
||||
}
|
||||
|
||||
static inline void Ft_set_data_size( File_trailer data, unsigned long long sz )
|
||||
static inline void Lt_set_data_size( Lzip_trailer data, unsigned long long sz )
|
||||
{ int i; for( i = 4; i <= 11; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } }
|
||||
|
||||
static inline unsigned long long Ft_get_member_size( const File_trailer data )
|
||||
static inline unsigned long long Lt_get_member_size( const Lzip_trailer data )
|
||||
{
|
||||
unsigned long long tmp = 0;
|
||||
int i; for( i = 19; i >= 12; --i ) { tmp <<= 8; tmp += data[i]; }
|
||||
return tmp;
|
||||
}
|
||||
|
||||
static inline void Ft_set_member_size( File_trailer data, unsigned long long sz )
|
||||
static inline void Lt_set_member_size( Lzip_trailer data, unsigned long long sz )
|
||||
{ int i; for( i = 12; i <= 19; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } }
|
||||
|
||||
/* check internal consistency */
|
||||
static inline bool Lt_verify_consistency( const Lzip_trailer data )
|
||||
{
|
||||
const unsigned crc = Lt_get_data_crc( data );
|
||||
const unsigned long long dsize = Lt_get_data_size( data );
|
||||
const unsigned long long msize = Lt_get_member_size( data );
|
||||
const unsigned long long mlimit = ( 9 * dsize + 7 ) / 8 + min_member_size;
|
||||
const unsigned long long dlimit = 7090 * ( msize - 26 ) - 1;
|
||||
if( ( crc == 0 ) != ( dsize == 0 ) ) return false;
|
||||
if( msize < min_member_size ) return false;
|
||||
if( mlimit > dsize && msize > mlimit ) return false;
|
||||
if( dlimit > msize && dsize > dlimit ) return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static const char * const bad_magic_msg = "Bad magic number (file not in lzip format).";
|
||||
static const char * const bad_dict_msg = "Invalid dictionary size in member header.";
|
||||
|
|
273
lzip_index.c
Normal file
273
lzip_index.c
Normal file
|
@ -0,0 +1,273 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
||||
#include <errno.h>
|
||||
#include <stdbool.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <stdint.h>
|
||||
#include <stdlib.h>
|
||||
#include <unistd.h>
|
||||
|
||||
#include "lzip.h"
|
||||
#include "lzip_index.h"
|
||||
|
||||
|
||||
static int seek_read( const int fd, uint8_t * const buf, const int size,
|
||||
const long long pos )
|
||||
{
|
||||
if( lseek( fd, pos, SEEK_SET ) == pos )
|
||||
return readblock( fd, buf, size );
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
static bool add_error( struct Lzip_index * const li, const char * const msg )
|
||||
{
|
||||
const int len = strlen( msg );
|
||||
void * tmp = resize_buffer( li->error, li->error_size + len + 1 );
|
||||
if( !tmp ) return false;
|
||||
li->error = (char *)tmp;
|
||||
strncpy( li->error + li->error_size, msg, len + 1 );
|
||||
li->error_size += len;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static bool push_back_member( struct Lzip_index * const li,
|
||||
const long long dp, const long long ds,
|
||||
const long long mp, const long long ms,
|
||||
const unsigned dict_size )
|
||||
{
|
||||
struct Member * p;
|
||||
void * tmp = resize_buffer( li->member_vector,
|
||||
( li->members + 1 ) * sizeof li->member_vector[0] );
|
||||
if( !tmp )
|
||||
{ add_error( li, "Not enough memory." ); li->retval = 1; return false; }
|
||||
li->member_vector = (struct Member *)tmp;
|
||||
p = &(li->member_vector[li->members]);
|
||||
init_member( p, dp, ds, mp, ms, dict_size );
|
||||
++li->members;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static void Li_free_member_vector( struct Lzip_index * const li )
|
||||
{
|
||||
if( li->member_vector )
|
||||
{ free( li->member_vector ); li->member_vector = 0; }
|
||||
li->members = 0;
|
||||
}
|
||||
|
||||
|
||||
static void Li_reverse_member_vector( struct Lzip_index * const li )
|
||||
{
|
||||
struct Member tmp;
|
||||
long i;
|
||||
for( i = 0; i < li->members / 2; ++i )
|
||||
{
|
||||
tmp = li->member_vector[i];
|
||||
li->member_vector[i] = li->member_vector[li->members-i-1];
|
||||
li->member_vector[li->members-i-1] = tmp;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
static void Li_set_errno_error( struct Lzip_index * const li,
|
||||
const char * const msg )
|
||||
{
|
||||
add_error( li, msg ); add_error( li, strerror( errno ) );
|
||||
li->retval = 1;
|
||||
}
|
||||
|
||||
static void Li_set_num_error( struct Lzip_index * const li,
|
||||
const char * const msg, unsigned long long num )
|
||||
{
|
||||
char buf[80];
|
||||
snprintf( buf, sizeof buf, "%s%llu", msg, num );
|
||||
add_error( li, buf );
|
||||
li->retval = 2;
|
||||
}
|
||||
|
||||
|
||||
/* If successful, push last member and set pos to member header. */
|
||||
static bool Li_skip_trailing_data( struct Lzip_index * const li,
|
||||
const int fd, long long * const pos,
|
||||
const bool ignore_trailing,
|
||||
const bool loose_trailing )
|
||||
{
|
||||
enum { block_size = 16384,
|
||||
buffer_size = block_size + Lt_size - 1 + Lh_size };
|
||||
uint8_t buffer[buffer_size];
|
||||
int bsize = *pos % block_size; /* total bytes in buffer */
|
||||
int search_size, rd_size;
|
||||
unsigned long long ipos;
|
||||
int i;
|
||||
if( *pos < min_member_size ) return false;
|
||||
if( bsize <= buffer_size - block_size ) bsize += block_size;
|
||||
search_size = bsize; /* bytes to search for trailer */
|
||||
rd_size = bsize; /* bytes to read from file */
|
||||
ipos = *pos - rd_size; /* aligned to block_size */
|
||||
|
||||
while( true )
|
||||
{
|
||||
const uint8_t max_msb = ( ipos + search_size ) >> 56;
|
||||
if( seek_read( fd, buffer, rd_size, ipos ) != rd_size )
|
||||
{ Li_set_errno_error( li, "Error seeking member trailer: " );
|
||||
return false; }
|
||||
for( i = search_size; i >= Lt_size; --i )
|
||||
if( buffer[i-1] <= max_msb ) /* most significant byte of member_size */
|
||||
{
|
||||
Lzip_header header;
|
||||
const Lzip_trailer * const trailer =
|
||||
(const Lzip_trailer *)( buffer + i - Lt_size );
|
||||
const unsigned long long member_size = Lt_get_member_size( *trailer );
|
||||
unsigned dictionary_size;
|
||||
if( member_size == 0 ) /* skip trailing zeros */
|
||||
{ while( i > Lt_size && buffer[i-9] == 0 ) --i; continue; }
|
||||
if( member_size > ipos + i || !Lt_verify_consistency( *trailer ) )
|
||||
continue;
|
||||
if( seek_read( fd, header, Lh_size,
|
||||
ipos + i - member_size ) != Lh_size )
|
||||
{ Li_set_errno_error( li, "Error reading member header: " );
|
||||
return false; }
|
||||
dictionary_size = Lh_get_dictionary_size( header );
|
||||
if( !Lh_verify_magic( header ) || !Lh_verify_version( header ) ||
|
||||
!isvalid_ds( dictionary_size ) ) continue;
|
||||
if( Lh_verify_prefix( buffer + i, bsize - i ) )
|
||||
{
|
||||
add_error( li, "Last member in input file is truncated or corrupt." );
|
||||
li->retval = 2; return false;
|
||||
}
|
||||
if( !loose_trailing && bsize - i >= Lh_size &&
|
||||
Lh_verify_corrupt( buffer + i ) )
|
||||
{ add_error( li, corrupt_mm_msg ); li->retval = 2; return false; }
|
||||
if( !ignore_trailing )
|
||||
{ add_error( li, trailing_msg ); li->retval = 2; return false; }
|
||||
*pos = ipos + i - member_size;
|
||||
return push_back_member( li, 0, Lt_get_data_size( *trailer ), *pos,
|
||||
member_size, dictionary_size );
|
||||
}
|
||||
if( ipos <= 0 )
|
||||
{ Li_set_num_error( li, "Bad trailer at pos ", *pos - Lt_size );
|
||||
return false; }
|
||||
bsize = buffer_size;
|
||||
search_size = bsize - Lh_size;
|
||||
rd_size = block_size;
|
||||
ipos -= rd_size;
|
||||
memcpy( buffer + rd_size, buffer, buffer_size - rd_size );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
bool Li_init( struct Lzip_index * const li, const int infd,
|
||||
const bool ignore_trailing, const bool loose_trailing )
|
||||
{
|
||||
Lzip_header header;
|
||||
long long pos;
|
||||
long i;
|
||||
li->member_vector = 0;
|
||||
li->error = 0;
|
||||
li->insize = lseek( infd, 0, SEEK_END );
|
||||
li->members = 0;
|
||||
li->error_size = 0;
|
||||
li->retval = 0;
|
||||
if( li->insize < 0 )
|
||||
{ Li_set_errno_error( li, "Input file is not seekable: " ); return false; }
|
||||
if( li->insize < min_member_size )
|
||||
{ add_error( li, "Input file is too short." ); li->retval = 2;
|
||||
return false; }
|
||||
if( li->insize > INT64_MAX )
|
||||
{ add_error( li, "Input file is too long (2^63 bytes or more)." );
|
||||
li->retval = 2; return false; }
|
||||
|
||||
if( seek_read( infd, header, Lh_size, 0 ) != Lh_size )
|
||||
{ Li_set_errno_error( li, "Error reading member header: " ); return false; }
|
||||
if( !Lh_verify_magic( header ) )
|
||||
{ add_error( li, bad_magic_msg ); li->retval = 2; return false; }
|
||||
if( !Lh_verify_version( header ) )
|
||||
{ add_error( li, bad_version( Lh_version( header ) ) ); li->retval = 2;
|
||||
return false; }
|
||||
if( !isvalid_ds( Lh_get_dictionary_size( header ) ) )
|
||||
{ add_error( li, bad_dict_msg ); li->retval = 2; return false; }
|
||||
|
||||
pos = li->insize; /* always points to a header or to EOF */
|
||||
while( pos >= min_member_size )
|
||||
{
|
||||
Lzip_trailer trailer;
|
||||
unsigned long long member_size;
|
||||
unsigned dictionary_size;
|
||||
if( seek_read( infd, trailer, Lt_size, pos - Lt_size ) != Lt_size )
|
||||
{ Li_set_errno_error( li, "Error reading member trailer: " ); break; }
|
||||
member_size = Lt_get_member_size( trailer );
|
||||
if( member_size > (unsigned long long)pos || !Lt_verify_consistency( trailer ) )
|
||||
{
|
||||
if( li->members <= 0 )
|
||||
{ if( Li_skip_trailing_data( li, infd, &pos, ignore_trailing,
|
||||
loose_trailing ) ) continue; else return false; }
|
||||
Li_set_num_error( li, "Bad trailer at pos ", pos - Lt_size );
|
||||
break;
|
||||
}
|
||||
if( seek_read( infd, header, Lh_size, pos - member_size ) != Lh_size )
|
||||
{ Li_set_errno_error( li, "Error reading member header: " ); break; }
|
||||
dictionary_size = Lh_get_dictionary_size( header );
|
||||
if( !Lh_verify_magic( header ) || !Lh_verify_version( header ) ||
|
||||
!isvalid_ds( dictionary_size ) )
|
||||
{
|
||||
if( li->members <= 0 )
|
||||
{ if( Li_skip_trailing_data( li, infd, &pos, ignore_trailing,
|
||||
loose_trailing ) ) continue; else return false; }
|
||||
Li_set_num_error( li, "Bad header at pos ", pos - member_size );
|
||||
break;
|
||||
}
|
||||
pos -= member_size;
|
||||
if( !push_back_member( li, 0, Lt_get_data_size( trailer ), pos,
|
||||
member_size, dictionary_size ) )
|
||||
return false;
|
||||
}
|
||||
if( pos != 0 || li->members <= 0 )
|
||||
{
|
||||
Li_free_member_vector( li );
|
||||
if( li->retval == 0 )
|
||||
{ add_error( li, "Can't create file index." ); li->retval = 2; }
|
||||
return false;
|
||||
}
|
||||
Li_reverse_member_vector( li );
|
||||
for( i = 0; ; ++i )
|
||||
{
|
||||
const long long end = block_end( li->member_vector[i].dblock );
|
||||
if( end < 0 || end > INT64_MAX )
|
||||
{
|
||||
Li_free_member_vector( li );
|
||||
add_error( li, "Data in input file is too long (2^63 bytes or more)." );
|
||||
li->retval = 2; return false;
|
||||
}
|
||||
if( i + 1 >= li->members ) break;
|
||||
li->member_vector[i+1].dblock.pos = end;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
void Li_free( struct Lzip_index * const li )
|
||||
{
|
||||
Li_free_member_vector( li );
|
||||
if( li->error ) { free( li->error ); li->error = 0; }
|
||||
li->error_size = 0;
|
||||
}
|
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -46,45 +46,45 @@ static inline void init_member( struct Member * const m,
|
|||
{ init_block( &m->dblock, dp, ds ); init_block( &m->mblock, mp, ms );
|
||||
m->dictionary_size = dict_size; }
|
||||
|
||||
struct File_index
|
||||
struct Lzip_index
|
||||
{
|
||||
struct Member * member_vector;
|
||||
char * error;
|
||||
long long isize;
|
||||
long long insize;
|
||||
long members;
|
||||
int error_size;
|
||||
int retval;
|
||||
};
|
||||
|
||||
bool Fi_init( struct File_index * const fi, const int infd,
|
||||
bool Li_init( struct Lzip_index * const li, const int infd,
|
||||
const bool ignore_trailing, const bool loose_trailing );
|
||||
|
||||
void Fi_free( struct File_index * const fi );
|
||||
void Li_free( struct Lzip_index * const li );
|
||||
|
||||
static inline long long Fi_udata_size( const struct File_index * const fi )
|
||||
static inline long long Li_udata_size( const struct Lzip_index * const li )
|
||||
{
|
||||
if( fi->members <= 0 ) return 0;
|
||||
return block_end( fi->member_vector[fi->members-1].dblock );
|
||||
if( li->members <= 0 ) return 0;
|
||||
return block_end( li->member_vector[li->members-1].dblock );
|
||||
}
|
||||
|
||||
static inline long long Fi_cdata_size( const struct File_index * const fi )
|
||||
static inline long long Li_cdata_size( const struct Lzip_index * const li )
|
||||
{
|
||||
if( fi->members <= 0 ) return 0;
|
||||
return block_end( fi->member_vector[fi->members-1].mblock );
|
||||
if( li->members <= 0 ) return 0;
|
||||
return block_end( li->member_vector[li->members-1].mblock );
|
||||
}
|
||||
|
||||
/* total size including trailing data (if any) */
|
||||
static inline long long Fi_file_size( const struct File_index * const fi )
|
||||
{ if( fi->isize >= 0 ) return fi->isize; else return 0; }
|
||||
static inline long long Li_file_size( const struct Lzip_index * const li )
|
||||
{ if( li->insize >= 0 ) return li->insize; else return 0; }
|
||||
|
||||
static inline const struct Block * Fi_dblock( const struct File_index * const fi,
|
||||
static inline const struct Block * Li_dblock( const struct Lzip_index * const li,
|
||||
const long i )
|
||||
{ return &fi->member_vector[i].dblock; }
|
||||
{ return &li->member_vector[i].dblock; }
|
||||
|
||||
static inline const struct Block * Fi_mblock( const struct File_index * const fi,
|
||||
static inline const struct Block * Li_mblock( const struct Lzip_index * const li,
|
||||
const long i )
|
||||
{ return &fi->member_vector[i].mblock; }
|
||||
{ return &li->member_vector[i].mblock; }
|
||||
|
||||
static inline unsigned Fi_dictionary_size( const struct File_index * const fi,
|
||||
static inline unsigned Li_dictionary_size( const struct Lzip_index * const li,
|
||||
const long i )
|
||||
{ return fi->member_vector[i].dictionary_size; }
|
||||
{ return li->member_vector[i].dictionary_size; }
|
123
main.c
123
main.c
|
@ -1,5 +1,5 @@
|
|||
/* Clzip - LZMA lossless data compressor
|
||||
Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
|
@ -36,20 +36,25 @@
|
|||
#include <unistd.h>
|
||||
#include <utime.h>
|
||||
#include <sys/stat.h>
|
||||
#if defined(__MSVCRT__)
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
#include <io.h>
|
||||
#if defined(__MSVCRT__)
|
||||
#define fchmod(x,y) 0
|
||||
#define fchown(x,y,z) 0
|
||||
#define strtoull strtoul
|
||||
#define SIGHUP SIGTERM
|
||||
#define S_ISSOCK(x) 0
|
||||
#ifndef S_IRGRP
|
||||
#define S_IRGRP 0
|
||||
#define S_IWGRP 0
|
||||
#define S_IROTH 0
|
||||
#define S_IWOTH 0
|
||||
#endif
|
||||
#if defined(__OS2__)
|
||||
#include <io.h>
|
||||
#endif
|
||||
#if defined(__DJGPP__)
|
||||
#define S_ISSOCK(x) 0
|
||||
#define S_ISVTX 0
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#include "carg_parser.h"
|
||||
|
@ -69,9 +74,8 @@
|
|||
|
||||
int verbosity = 0;
|
||||
|
||||
const char * const Program_name = "Clzip";
|
||||
const char * const program_name = "clzip";
|
||||
const char * const program_year = "2018";
|
||||
const char * const program_year = "2019";
|
||||
const char * invocation_name = 0;
|
||||
|
||||
const struct { const char * from; const char * to; } known_extensions[] = {
|
||||
|
@ -87,6 +91,8 @@ struct Lzma_options
|
|||
|
||||
enum Mode { m_compress, m_decompress, m_list, m_test };
|
||||
|
||||
/* Variables used in signal handler context.
|
||||
They are not declared volatile because the handler never returns. */
|
||||
char * output_filename = 0;
|
||||
int outfd = -1;
|
||||
bool delete_output_on_interrupt = false;
|
||||
|
@ -94,8 +100,18 @@ bool delete_output_on_interrupt = false;
|
|||
|
||||
static void show_help( void )
|
||||
{
|
||||
printf( "%s - LZMA lossless data compressor.\n", Program_name );
|
||||
printf( "\nUsage: %s [options] [files]\n", invocation_name );
|
||||
printf( "Clzip is a C language version of lzip, fully compatible with lzip 1.4 or\n"
|
||||
"newer. As clzip is written in C, it may be easier to integrate in\n"
|
||||
"applications like package managers, embedded devices, or systems lacking\n"
|
||||
"a C++ compiler.\n"
|
||||
"\nLzip is a lossless data compressor with a user interface similar to the\n"
|
||||
"one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0)\n"
|
||||
"or compress most files more than bzip2 (lzip -9). Decompression speed is\n"
|
||||
"intermediate between gzip and bzip2. Lzip is better than gzip and bzip2\n"
|
||||
"from a data recovery perspective. Lzip has been designed, written and\n"
|
||||
"tested with great care to replace gzip and bzip2 as the standard\n"
|
||||
"general-purpose compressed format for unix-like systems.\n"
|
||||
"\nUsage: %s [options] [files]\n", invocation_name );
|
||||
printf( "\nOptions:\n"
|
||||
" -h, --help display this help and exit\n"
|
||||
" -V, --version output version information and exit\n"
|
||||
|
@ -111,7 +127,7 @@ static void show_help( void )
|
|||
" -o, --output=<file> if reading standard input, write to <file>\n"
|
||||
" -q, --quiet suppress all messages\n"
|
||||
" -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n"
|
||||
" -S, --volume-size=<bytes> set volume size limit in bytes, implies -k\n"
|
||||
" -S, --volume-size=<bytes> set volume size limit in bytes\n"
|
||||
" -t, --test test compressed file integrity\n"
|
||||
" -v, --verbose be verbose (a 2nd -v gives more)\n"
|
||||
" -0 .. -9 set compression level [default 6]\n"
|
||||
|
@ -227,7 +243,7 @@ static unsigned long long getnum( const char * const ptr,
|
|||
if( !errno && tail[0] )
|
||||
{
|
||||
const unsigned factor = ( tail[1] == 'i' ) ? 1024 : 1000;
|
||||
int exponent = 0; /* 0 = bad multiplier */
|
||||
int exponent = 0; /* 0 = bad multiplier */
|
||||
int i;
|
||||
switch( tail[0] )
|
||||
{
|
||||
|
@ -268,7 +284,7 @@ static int get_dict_size( const char * const arg )
|
|||
const long bits = strtol( arg, &tail, 0 );
|
||||
if( bits >= min_dictionary_bits &&
|
||||
bits <= max_dictionary_bits && *tail == 0 )
|
||||
return ( 1 << bits );
|
||||
return 1 << bits;
|
||||
return getnum( arg, min_dictionary_size, max_dictionary_size );
|
||||
}
|
||||
|
||||
|
@ -423,8 +439,17 @@ static bool check_tty( const char * const input_filename, const int infd,
|
|||
}
|
||||
|
||||
|
||||
static void set_signals( void (*action)(int) )
|
||||
{
|
||||
signal( SIGHUP, action );
|
||||
signal( SIGINT, action );
|
||||
signal( SIGTERM, action );
|
||||
}
|
||||
|
||||
|
||||
void cleanup_and_fail( const int retval )
|
||||
{
|
||||
set_signals( SIG_IGN ); /* ignore signals */
|
||||
if( delete_output_on_interrupt )
|
||||
{
|
||||
delete_output_on_interrupt = false;
|
||||
|
@ -439,6 +464,14 @@ void cleanup_and_fail( const int retval )
|
|||
}
|
||||
|
||||
|
||||
void signal_handler( int sig )
|
||||
{
|
||||
if( sig ) {} /* keep compiler happy */
|
||||
show_error( "Control-C or similar caught, quitting.", 0, false );
|
||||
cleanup_and_fail( 1 );
|
||||
}
|
||||
|
||||
|
||||
/* Set permissions, owner and times. */
|
||||
static void close_and_set_permissions( const struct stat * const in_statsp )
|
||||
{
|
||||
|
@ -518,13 +551,13 @@ static int compress( const unsigned long long cfile_size,
|
|||
}
|
||||
else
|
||||
{
|
||||
File_header header;
|
||||
if( Fh_set_dictionary_size( header, encoder_options->dictionary_size ) &&
|
||||
Lzip_header header;
|
||||
if( Lh_set_dictionary_size( header, encoder_options->dictionary_size ) &&
|
||||
encoder_options->match_len_limit >= min_match_len_limit &&
|
||||
encoder_options->match_len_limit <= max_match_len )
|
||||
encoder.e = (struct LZ_encoder *)malloc( sizeof *encoder.e );
|
||||
else internal_error( "invalid argument to encoder." );
|
||||
if( !encoder.e || !LZe_init( encoder.e, Fh_get_dictionary_size( header ),
|
||||
if( !encoder.e || !LZe_init( encoder.e, Lh_get_dictionary_size( header ),
|
||||
encoder_options->match_len_limit, infd, outfd ) )
|
||||
error = true;
|
||||
else encoder.eb = &encoder.e->eb;
|
||||
|
@ -637,16 +670,16 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
{
|
||||
int result, size;
|
||||
unsigned dictionary_size;
|
||||
File_header header;
|
||||
Lzip_header header;
|
||||
struct LZ_decoder decoder;
|
||||
Rd_reset_member_position( &rdec );
|
||||
size = Rd_read_data( &rdec, header, Fh_size );
|
||||
size = Rd_read_data( &rdec, header, Lh_size );
|
||||
if( Rd_finished( &rdec ) ) /* End Of File */
|
||||
{
|
||||
if( first_member )
|
||||
{ show_file_error( pp->name, "File ends unexpectedly at member header.", 0 );
|
||||
retval = 2; }
|
||||
else if( Fh_verify_prefix( header, size ) )
|
||||
else if( Lh_verify_prefix( header, size ) )
|
||||
{ Pp_show_msg( pp, "Truncated header in multimember file." );
|
||||
show_trailing_data( header, size, pp, true, -1 );
|
||||
retval = 2; }
|
||||
|
@ -655,11 +688,11 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
retval = 2;
|
||||
break;
|
||||
}
|
||||
if( !Fh_verify_magic( header ) )
|
||||
if( !Lh_verify_magic( header ) )
|
||||
{
|
||||
if( first_member )
|
||||
{ show_file_error( pp->name, bad_magic_msg, 0 ); retval = 2; }
|
||||
else if( !loose_trailing && Fh_verify_corrupt( header ) )
|
||||
else if( !loose_trailing && Lh_verify_corrupt( header ) )
|
||||
{ Pp_show_msg( pp, corrupt_mm_msg );
|
||||
show_trailing_data( header, size, pp, false, -1 );
|
||||
retval = 2; }
|
||||
|
@ -667,10 +700,10 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
retval = 2;
|
||||
break;
|
||||
}
|
||||
if( !Fh_verify_version( header ) )
|
||||
{ Pp_show_msg( pp, bad_version( Fh_version( header ) ) );
|
||||
if( !Lh_verify_version( header ) )
|
||||
{ Pp_show_msg( pp, bad_version( Lh_version( header ) ) );
|
||||
retval = 2; break; }
|
||||
dictionary_size = Fh_get_dictionary_size( header );
|
||||
dictionary_size = Lh_get_dictionary_size( header );
|
||||
if( !isvalid_ds( dictionary_size ) )
|
||||
{ Pp_show_msg( pp, bad_dict_msg ); retval = 2; break; }
|
||||
|
||||
|
@ -689,7 +722,8 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
{
|
||||
Pp_show_msg( pp, 0 );
|
||||
fprintf( stderr, "%s at pos %llu\n", ( result == 2 ) ?
|
||||
"File ends unexpectedly" : "Decoder error", partial_file_pos );
|
||||
"File ends unexpectedly" : "Decoder error",
|
||||
partial_file_pos );
|
||||
}
|
||||
retval = 2; break;
|
||||
}
|
||||
|
@ -703,31 +737,13 @@ static int decompress( const unsigned long long cfile_size, const int infd,
|
|||
}
|
||||
|
||||
|
||||
void signal_handler( int sig )
|
||||
{
|
||||
if( sig ) {} /* keep compiler happy */
|
||||
show_error( "Control-C or similar caught, quitting.", 0, false );
|
||||
cleanup_and_fail( 1 );
|
||||
}
|
||||
|
||||
|
||||
static void set_signals( void )
|
||||
{
|
||||
signal( SIGHUP, signal_handler );
|
||||
signal( SIGINT, signal_handler );
|
||||
signal( SIGTERM, signal_handler );
|
||||
}
|
||||
|
||||
|
||||
void show_error( const char * const msg, const int errcode, const bool help )
|
||||
{
|
||||
if( verbosity < 0 ) return;
|
||||
if( msg && msg[0] )
|
||||
{
|
||||
fprintf( stderr, "%s: %s", program_name, msg );
|
||||
if( errcode > 0 ) fprintf( stderr, ": %s", strerror( errcode ) );
|
||||
fputc( '\n', stderr );
|
||||
}
|
||||
fprintf( stderr, "%s: %s%s%s\n", program_name, msg,
|
||||
( errcode > 0 ) ? ": " : "",
|
||||
( errcode > 0 ) ? strerror( errcode ) : "" );
|
||||
if( help )
|
||||
fprintf( stderr, "Try '%s --help' for more information.\n",
|
||||
invocation_name );
|
||||
|
@ -737,10 +753,10 @@ void show_error( const char * const msg, const int errcode, const bool help )
|
|||
void show_file_error( const char * const filename, const char * const msg,
|
||||
const int errcode )
|
||||
{
|
||||
if( verbosity < 0 ) return;
|
||||
fprintf( stderr, "%s: %s: %s", program_name, filename, msg );
|
||||
if( errcode > 0 ) fprintf( stderr, ": %s", strerror( errcode ) );
|
||||
fputc( '\n', stderr );
|
||||
if( verbosity >= 0 )
|
||||
fprintf( stderr, "%s: %s: %s%s%s\n", program_name, filename, msg,
|
||||
( errcode > 0 ) ? ": " : "",
|
||||
( errcode > 0 ) ? strerror( errcode ) : "" );
|
||||
}
|
||||
|
||||
|
||||
|
@ -933,7 +949,7 @@ int main( const int argc, const char * const argv[] )
|
|||
}
|
||||
} /* end process options */
|
||||
|
||||
#if defined(__MSVCRT__) || defined(__OS2__)
|
||||
#if defined(__MSVCRT__) || defined(__OS2__) || defined(__DJGPP__)
|
||||
setmode( STDIN_FILENO, O_BINARY );
|
||||
setmode( STDOUT_FILENO, O_BINARY );
|
||||
#endif
|
||||
|
@ -961,7 +977,7 @@ int main( const int argc, const char * const argv[] )
|
|||
|
||||
if( !to_stdout && program_mode != m_test &&
|
||||
( filenames_given || default_output_filename[0] ) )
|
||||
set_signals();
|
||||
set_signals( signal_handler );
|
||||
|
||||
Pp_init( &pp, filenames, num_filenames );
|
||||
|
||||
|
@ -1044,6 +1060,12 @@ int main( const int argc, const char * const argv[] )
|
|||
else
|
||||
tmp = decompress( cfile_size, infd, &pp, ignore_trailing,
|
||||
loose_trailing, program_mode == m_test );
|
||||
if( close( infd ) != 0 )
|
||||
{
|
||||
show_error( input_filename[0] ? "Error closing input file" :
|
||||
"Error closing stdin", errno, false );
|
||||
if( tmp < 1 ) tmp = 1;
|
||||
}
|
||||
if( tmp > retval ) retval = tmp;
|
||||
if( tmp )
|
||||
{ if( program_mode != m_test ) cleanup_and_fail( retval );
|
||||
|
@ -1053,7 +1075,6 @@ int main( const int argc, const char * const argv[] )
|
|||
close_and_set_permissions( in_statsp );
|
||||
if( input_filename[0] )
|
||||
{
|
||||
close( infd );
|
||||
if( !keep_input_files && !to_stdout && program_mode != m_test &&
|
||||
( program_mode != m_compress || volume_size == 0 ) )
|
||||
remove( input_filename );
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
#! /bin/sh
|
||||
# check script for Clzip - LZMA lossless data compressor
|
||||
# Copyright (C) 2010-2018 Antonio Diaz Diaz.
|
||||
# Copyright (C) 2010-2019 Antonio Diaz Diaz.
|
||||
#
|
||||
# This script is free software: you have unlimited permission
|
||||
# to copy, distribute and modify it.
|
||||
|
@ -36,12 +36,15 @@ test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; }
|
|||
printf "testing clzip-%s..." "$2"
|
||||
|
||||
"${LZIP}" -fkqm4 in
|
||||
{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
[ ! -e in.lz ] || test_failed $LINENO
|
||||
"${LZIP}" -fkqm274 in
|
||||
{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
[ ! -e in.lz ] || test_failed $LINENO
|
||||
for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do
|
||||
"${LZIP}" -fkqs $i in
|
||||
{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO $i
|
||||
[ $? = 1 ] || test_failed $LINENO $i
|
||||
[ ! -e in.lz ] || test_failed $LINENO $i
|
||||
done
|
||||
"${LZIP}" -lq in
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
|
@ -91,31 +94,34 @@ printf "\ntesting decompression..."
|
|||
"${LZIP}" -cd "${in_lz}" > copy || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
|
||||
rm -f copy
|
||||
rm -f copy || framework_failure
|
||||
cat "${in_lz}" > copy.lz || framework_failure
|
||||
"${LZIP}" -dk copy.lz || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
printf "to be overwritten" > copy || framework_failure
|
||||
"${LZIP}" -d copy.lz 2> /dev/null
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
"${LZIP}" -df copy.lz
|
||||
{ [ $? = 0 ] && [ ! -e copy.lz ] && cmp in copy ; } || test_failed $LINENO
|
||||
"${LZIP}" -df copy.lz || test_failed $LINENO
|
||||
[ ! -e copy.lz ] || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
|
||||
rm -f copy
|
||||
rm -f copy || framework_failure
|
||||
cat "${in_lz}" > copy.lz || framework_failure
|
||||
"${LZIP}" -d -S100k copy.lz
|
||||
{ [ $? = 0 ] && [ ! -e copy.lz ] && cmp in copy ; } || test_failed $LINENO
|
||||
"${LZIP}" -d -S100k copy.lz || test_failed $LINENO # ignore -S
|
||||
[ ! -e copy.lz ] || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
|
||||
printf "to be overwritten" > copy || framework_failure
|
||||
"${LZIP}" -df -o copy < "${in_lz}" || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
|
||||
rm -f copy
|
||||
rm -f copy || framework_failure
|
||||
"${LZIP}" < in > anyothername || test_failed $LINENO
|
||||
"${LZIP}" -dv --output copy - anyothername - < "${in_lz}" 2> /dev/null
|
||||
{ [ $? = 0 ] && cmp in copy && cmp in anyothername.out ; } ||
|
||||
"${LZIP}" -dv --output copy - anyothername - < "${in_lz}" 2> /dev/null ||
|
||||
test_failed $LINENO
|
||||
rm -f copy anyothername.out
|
||||
cmp in copy || test_failed $LINENO
|
||||
cmp in anyothername.out || test_failed $LINENO
|
||||
rm -f copy anyothername.out || framework_failure
|
||||
|
||||
"${LZIP}" -lq in "${in_lz}"
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
|
@ -126,10 +132,12 @@ rm -f copy anyothername.out
|
|||
"${LZIP}" -tq nx_file.lz "${in_lz}"
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
"${LZIP}" -cdq in "${in_lz}" > copy
|
||||
{ [ $? = 2 ] && cat copy in | cmp in - ; } || test_failed $LINENO
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
cat copy in | cmp in - || test_failed $LINENO
|
||||
"${LZIP}" -cdq nx_file.lz "${in_lz}" > copy
|
||||
{ [ $? = 1 ] && cmp in copy ; } || test_failed $LINENO
|
||||
rm -f copy
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
rm -f copy || framework_failure
|
||||
cat "${in_lz}" > copy.lz || framework_failure
|
||||
for i in 1 2 3 4 5 6 7 ; do
|
||||
printf "g" >> copy.lz || framework_failure
|
||||
|
@ -139,11 +147,15 @@ for i in 1 2 3 4 5 6 7 ; do
|
|||
[ $? = 2 ] || test_failed $LINENO $i
|
||||
done
|
||||
"${LZIP}" -dq in copy.lz
|
||||
{ [ $? = 2 ] && [ -e copy.lz ] && [ ! -e copy ] && [ ! -e in.out ] ; } ||
|
||||
test_failed $LINENO
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
[ -e copy.lz ] || test_failed $LINENO
|
||||
[ ! -e copy ] || test_failed $LINENO
|
||||
[ ! -e in.out ] || test_failed $LINENO
|
||||
"${LZIP}" -dq nx_file.lz copy.lz
|
||||
{ [ $? = 1 ] && [ ! -e copy.lz ] && [ ! -e nx_file ] && cmp in copy ; } ||
|
||||
test_failed $LINENO
|
||||
[ $? = 1 ] || test_failed $LINENO
|
||||
[ ! -e copy.lz ] || test_failed $LINENO
|
||||
[ ! -e nx_file ] || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
|
||||
cat in in > in2 || framework_failure
|
||||
cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure
|
||||
|
@ -160,7 +172,7 @@ cmp in2 copy2 || test_failed $LINENO
|
|||
|
||||
printf "\ngarbage" >> copy2.lz || framework_failure
|
||||
"${LZIP}" -tvvvv copy2.lz 2> /dev/null || test_failed $LINENO
|
||||
rm -f copy2
|
||||
rm -f copy2 || framework_failure
|
||||
"${LZIP}" -alq copy2.lz
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
"${LZIP}" -atq copy2.lz
|
||||
|
@ -168,12 +180,15 @@ rm -f copy2
|
|||
"${LZIP}" -atq < copy2.lz
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
"${LZIP}" -adkq copy2.lz
|
||||
{ [ $? = 2 ] && [ ! -e copy2 ] ; } || test_failed $LINENO
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
[ ! -e copy2 ] || test_failed $LINENO
|
||||
"${LZIP}" -adkq -o copy2 < copy2.lz
|
||||
{ [ $? = 2 ] && [ ! -e copy2 ] ; } || test_failed $LINENO
|
||||
[ $? = 2 ] || test_failed $LINENO
|
||||
[ ! -e copy2 ] || test_failed $LINENO
|
||||
printf "to be overwritten" > copy2 || framework_failure
|
||||
"${LZIP}" -df copy2.lz || test_failed $LINENO
|
||||
cmp in2 copy2 || test_failed $LINENO
|
||||
rm -f in2 copy2 || framework_failure
|
||||
|
||||
printf "\ntesting compression..."
|
||||
|
||||
|
@ -209,73 +224,94 @@ for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do
|
|||
"${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i
|
||||
cmp in copy || test_failed $LINENO $i
|
||||
done
|
||||
rm -f out.lz || framework_failure
|
||||
|
||||
cat in in in in in in in in > in8 || framework_failure
|
||||
"${LZIP}" -1s12 -S100k in8 || test_failed $LINENO
|
||||
"${LZIP}" -t in800001.lz in800002.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd in800001.lz in800002.lz | cmp in8 - || test_failed $LINENO
|
||||
rm -f in800001.lz in800002.lz
|
||||
rm -f in800001.lz in800002.lz || framework_failure
|
||||
"${LZIP}" -1s12 -S100k -o out.lz < in8 || test_failed $LINENO
|
||||
"${LZIP}" -t out.lz00001.lz out.lz00002.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd out.lz00001.lz out.lz00002.lz | cmp in8 - || test_failed $LINENO
|
||||
rm -f out.lz00001.lz out.lz00002.lz
|
||||
rm -f out.lz00001.lz out.lz00002.lz || framework_failure
|
||||
"${LZIP}" -1ks4Ki -b100000 in8 || test_failed $LINENO
|
||||
"${LZIP}" -t in8.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd in8.lz | cmp in8 - || test_failed $LINENO
|
||||
rm -f in8
|
||||
rm -f in8 || framework_failure
|
||||
"${LZIP}" -0 -S100k -o out < in8.lz || test_failed $LINENO
|
||||
"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO
|
||||
rm -f out00001.lz
|
||||
rm -f out00001.lz || framework_failure
|
||||
"${LZIP}" -1 -S100k -o out < in8.lz || test_failed $LINENO
|
||||
"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO
|
||||
rm -f out00001.lz out00002.lz
|
||||
rm -f out00001.lz out00002.lz || framework_failure
|
||||
"${LZIP}" -0 -F -S100k in8.lz || test_failed $LINENO
|
||||
"${LZIP}" -t in8.lz00001.lz in8.lz00002.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd in8.lz00001.lz in8.lz00002.lz | cmp in8.lz - || test_failed $LINENO
|
||||
rm -f in8.lz00001.lz in8.lz00002.lz
|
||||
rm -f in8.lz00001.lz in8.lz00002.lz || framework_failure
|
||||
"${LZIP}" -0kF -b100k in8.lz || test_failed $LINENO
|
||||
"${LZIP}" -t in8.lz.lz || test_failed $LINENO
|
||||
"${LZIP}" -cd in8.lz.lz | cmp in8.lz - || test_failed $LINENO
|
||||
rm -f in8.lz in8.lz.lz
|
||||
rm -f in8.lz in8.lz.lz || framework_failure
|
||||
|
||||
printf "\ntesting bad input..."
|
||||
|
||||
headers='LZIp LZiP LZip LzIP LzIp LziP lZIP lZIp lZiP lzIP'
|
||||
body='\001\014\000\203\377\373\377\377\300\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\000\000\000\000'
|
||||
cat "${in_lz}" > in0.lz
|
||||
printf "LZIP${body}" >> in0.lz
|
||||
if "${LZIP}" -tq in0.lz ; then
|
||||
cat "${in_lz}" > int.lz
|
||||
printf "LZIP${body}" >> int.lz
|
||||
if "${LZIP}" -tq int.lz ; then
|
||||
for header in ${headers} ; do
|
||||
printf "${header}${body}" > in0.lz # first member
|
||||
"${LZIP}" -lq in0.lz
|
||||
printf "${header}${body}" > int.lz # first member
|
||||
"${LZIP}" -lq int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq in0.lz
|
||||
"${LZIP}" -tq int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -lq --loose-trailing in0.lz
|
||||
"${LZIP}" -tq < int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq --loose-trailing in0.lz
|
||||
"${LZIP}" -cdq int.lz > /dev/null
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
cat "${in_lz}" > in0.lz
|
||||
printf "${header}${body}" >> in0.lz # trailing data
|
||||
"${LZIP}" -lq in0.lz
|
||||
"${LZIP}" -lq --loose-trailing int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq in0.lz
|
||||
"${LZIP}" -tq --loose-trailing int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -lq --loose-trailing in0.lz
|
||||
[ $? = 0 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -t --loose-trailing in0.lz
|
||||
[ $? = 0 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -lq --loose-trailing --trailing-error in0.lz
|
||||
"${LZIP}" -tq --loose-trailing < int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq --loose-trailing --trailing-error in0.lz
|
||||
"${LZIP}" -cdq --loose-trailing int.lz > /dev/null
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
cat "${in_lz}" > int.lz
|
||||
printf "${header}${body}" >> int.lz # trailing data
|
||||
"${LZIP}" -lq int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq < int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -cdq int.lz > /dev/null
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -lq --loose-trailing int.lz ||
|
||||
test_failed $LINENO ${header}
|
||||
"${LZIP}" -t --loose-trailing int.lz ||
|
||||
test_failed $LINENO ${header}
|
||||
"${LZIP}" -t --loose-trailing < int.lz ||
|
||||
test_failed $LINENO ${header}
|
||||
"${LZIP}" -cd --loose-trailing int.lz > /dev/null ||
|
||||
test_failed $LINENO ${header}
|
||||
"${LZIP}" -lq --loose-trailing --trailing-error int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq --loose-trailing --trailing-error int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -tq --loose-trailing --trailing-error < int.lz
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
"${LZIP}" -cdq --loose-trailing --trailing-error int.lz > /dev/null
|
||||
[ $? = 2 ] || test_failed $LINENO ${header}
|
||||
done
|
||||
else
|
||||
printf "\nwarning: skipping header test: 'printf' does not work on your system."
|
||||
fi
|
||||
rm -f in0.lz
|
||||
rm -f int.lz || framework_failure
|
||||
|
||||
cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure
|
||||
if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null &&
|
||||
|
@ -296,7 +332,7 @@ if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null &&
|
|||
else
|
||||
printf "\nwarning: skipping truncation test: 'dd' does not work on your system."
|
||||
fi
|
||||
rm -f in3.lz trunc.lz
|
||||
rm -f in2.lz in3.lz trunc.lz out || framework_failure
|
||||
|
||||
cat "${in_lz}" > ingin.lz || framework_failure
|
||||
printf "g" >> ingin.lz || framework_failure
|
||||
|
@ -309,7 +345,7 @@ cmp in copy || test_failed $LINENO
|
|||
"${LZIP}" -t < ingin.lz || test_failed $LINENO
|
||||
"${LZIP}" -d < ingin.lz > copy || test_failed $LINENO
|
||||
cmp in copy || test_failed $LINENO
|
||||
rm -f ingin.lz
|
||||
rm -f copy ingin.lz || framework_failure
|
||||
|
||||
echo
|
||||
if [ ${fail} = 0 ] ; then
|
||||
|
|
Loading…
Add table
Reference in a new issue