Merging upstream version 0.8.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
c7152715b0
commit
d91c44b5bd
28 changed files with 2668 additions and 574 deletions
36
ChangeLog
36
ChangeLog
|
@ -1,3 +1,39 @@
|
||||||
|
2018-12-16 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
|
* Version 0.8 released.
|
||||||
|
* Added new option '--anonymous' (--owner=root --group=root).
|
||||||
|
* extract.cc (decode): 'tarlz -xf foo ./bar' now extracts 'bar'.
|
||||||
|
* create.cc: Set to zero most fields in extended headers.
|
||||||
|
* tarlz.texi: Added new chapter 'Amendments to pax format'.
|
||||||
|
|
||||||
|
2018-11-23 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
|
* Version 0.7 released.
|
||||||
|
* Added new option '--keep-damaged'.
|
||||||
|
* Added new option '--no-solid'.
|
||||||
|
* create.cc (archive_write): Minimize dictionary size.
|
||||||
|
* create.cc: Detect and skip archive in '-A', '-c' and '-r'.
|
||||||
|
* main.cc (show_version): Show the version of lzlib being used.
|
||||||
|
|
||||||
|
2018-10-19 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
|
* Version 0.6 released.
|
||||||
|
* Added new option '-A, --concatenate'.
|
||||||
|
* Option '--ignore-crc' replaced with '--missing-crc'.
|
||||||
|
* create.cc (add_member): Test that uid, gid, mtime, devmajor
|
||||||
|
and devminor are in ustar range.
|
||||||
|
* configure: Accept appending to CXXFLAGS, 'CXXFLAGS+=OPTIONS'.
|
||||||
|
* Makefile.in: Use tarlz in target 'dist'.
|
||||||
|
|
||||||
|
2018-09-29 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
|
* Version 0.5 released.
|
||||||
|
* Implemented simplified posix pax format.
|
||||||
|
* Implemented CRC32-C (Castagnoli) of the extended header data.
|
||||||
|
* Added new option '--ignore-crc'.
|
||||||
|
* Added missing #includes for major, minor and makedev.
|
||||||
|
* tarlz.texi: Documented the new archive format.
|
||||||
|
|
||||||
2018-04-23 Antonio Diaz Diaz <antonio@gnu.org>
|
2018-04-23 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
* Version 0.4 released.
|
* Version 0.4 released.
|
||||||
|
|
7
INSTALL
7
INSTALL
|
@ -3,6 +3,9 @@ Requirements
|
||||||
You will need a C++ compiler and the lzlib compression library installed.
|
You will need a C++ compiler and the lzlib compression library installed.
|
||||||
I use gcc 5.3.0 and 4.1.2, but the code should compile with any
|
I use gcc 5.3.0 and 4.1.2, but the code should compile with any
|
||||||
standards compliant compiler.
|
standards compliant compiler.
|
||||||
|
Lzlib must be version 1.0 or newer, but --keep-damaged requires lzlib
|
||||||
|
1.11-rc2 or newer to recover as much data as possible from each damaged
|
||||||
|
member.
|
||||||
Gcc is available at http://gcc.gnu.org.
|
Gcc is available at http://gcc.gnu.org.
|
||||||
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
|
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
|
||||||
|
|
||||||
|
@ -24,6 +27,10 @@ the main archive.
|
||||||
cd tarlz[version]
|
cd tarlz[version]
|
||||||
./configure
|
./configure
|
||||||
|
|
||||||
|
To link against a lzlib not installed in a standard place, use:
|
||||||
|
|
||||||
|
./configure CPPFLAGS='-I<dir_of_lzlib.h>' LDFLAGS='-L<dir_of_liblz.a>'
|
||||||
|
|
||||||
3. Run make.
|
3. Run make.
|
||||||
|
|
||||||
make
|
make
|
||||||
|
|
13
Makefile.in
13
Makefile.in
|
@ -101,7 +101,7 @@ uninstall-man :
|
||||||
|
|
||||||
dist : doc
|
dist : doc
|
||||||
ln -sf $(VPATH) $(DISTNAME)
|
ln -sf $(VPATH) $(DISTNAME)
|
||||||
tar -Hustar --owner=root --group=root -cvf $(DISTNAME).tar \
|
tarlz --solid --owner=root --group=root -9cvf $(DISTNAME).tar.lz \
|
||||||
$(DISTNAME)/AUTHORS \
|
$(DISTNAME)/AUTHORS \
|
||||||
$(DISTNAME)/COPYING \
|
$(DISTNAME)/COPYING \
|
||||||
$(DISTNAME)/ChangeLog \
|
$(DISTNAME)/ChangeLog \
|
||||||
|
@ -118,17 +118,24 @@ dist : doc
|
||||||
$(DISTNAME)/testsuite/check.sh \
|
$(DISTNAME)/testsuite/check.sh \
|
||||||
$(DISTNAME)/testsuite/test.txt \
|
$(DISTNAME)/testsuite/test.txt \
|
||||||
$(DISTNAME)/testsuite/test.txt.tar \
|
$(DISTNAME)/testsuite/test.txt.tar \
|
||||||
|
$(DISTNAME)/testsuite/test_bad1.txt.tar \
|
||||||
|
$(DISTNAME)/testsuite/test_bad[12].txt \
|
||||||
|
$(DISTNAME)/testsuite/t155.tar \
|
||||||
$(DISTNAME)/testsuite/test3.tar \
|
$(DISTNAME)/testsuite/test3.tar \
|
||||||
$(DISTNAME)/testsuite/test3_bad[1-5].tar \
|
$(DISTNAME)/testsuite/test3_bad[1-5].tar \
|
||||||
$(DISTNAME)/testsuite/test.txt.lz \
|
$(DISTNAME)/testsuite/test.txt.lz \
|
||||||
$(DISTNAME)/testsuite/test.txt.tar.lz \
|
$(DISTNAME)/testsuite/test.txt.tar.lz \
|
||||||
|
$(DISTNAME)/testsuite/test_bad[12].txt.tar.lz \
|
||||||
$(DISTNAME)/testsuite/test3.tar.lz \
|
$(DISTNAME)/testsuite/test3.tar.lz \
|
||||||
$(DISTNAME)/testsuite/test3a.tar.lz \
|
$(DISTNAME)/testsuite/tlz_in_tar[12].tar \
|
||||||
|
$(DISTNAME)/testsuite/test3_dir.tar.lz \
|
||||||
|
$(DISTNAME)/testsuite/test3_dot.tar.lz \
|
||||||
|
$(DISTNAME)/testsuite/t155.tar.lz \
|
||||||
$(DISTNAME)/testsuite/test3_bad[1-6].tar.lz \
|
$(DISTNAME)/testsuite/test3_bad[1-6].tar.lz \
|
||||||
$(DISTNAME)/testsuite/dotdot[1-5].tar.lz \
|
$(DISTNAME)/testsuite/dotdot[1-5].tar.lz \
|
||||||
|
$(DISTNAME)/testsuite/ug32chars.tar.lz \
|
||||||
$(DISTNAME)/testsuite/eof.tar.lz
|
$(DISTNAME)/testsuite/eof.tar.lz
|
||||||
rm -f $(DISTNAME)
|
rm -f $(DISTNAME)
|
||||||
lzip -v -9 $(DISTNAME).tar
|
|
||||||
|
|
||||||
clean :
|
clean :
|
||||||
-rm -f $(progname) $(objs)
|
-rm -f $(progname) $(objs)
|
||||||
|
|
19
NEWS
19
NEWS
|
@ -1,5 +1,18 @@
|
||||||
Changes in version 0.4:
|
Changes in version 0.8:
|
||||||
|
|
||||||
Some missing #includes have been fixed.
|
The new option '--anonymous', equivalent to '--owner=root --group=root', has
|
||||||
|
been added.
|
||||||
|
|
||||||
Open files in binary mode on OS2.
|
On extraction and listing, tarlz now removes leading './' strings also from
|
||||||
|
member names given in the command line. 'tarlz -xf foo ./bar' now extracts
|
||||||
|
member 'bar' from archive 'foo'. (Reported by Viktor Sergiienko in the
|
||||||
|
bug-tar mailing list).
|
||||||
|
|
||||||
|
Tarlz now writes extended headers with all fields zeroed except size,
|
||||||
|
chksum, typeflag, magic and version. This prevents old tar programs from
|
||||||
|
extracting the extended records as a file in the wrong place (with a
|
||||||
|
truncated filename). Tarlz now also sets to zero those fields of the ustar
|
||||||
|
header overridden by extended records.
|
||||||
|
|
||||||
|
The chapter 'Amendments to pax format', explaining the reasons for the
|
||||||
|
differences with the pax format, has been added.
|
||||||
|
|
77
README
77
README
|
@ -1,36 +1,16 @@
|
||||||
Description
|
Description
|
||||||
|
|
||||||
Tarlz is a small and simple implementation of the tar archiver. By
|
Tarlz is a small and simple implementation of the tar archiver. By default
|
||||||
default tarlz creates, lists and extracts archives in the 'ustar' format
|
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||||
compressed with lzip on a per file basis. Tarlz can append files to the
|
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||||
end of such compressed archives.
|
its own lzip member, as well as the end-of-file blocks. This method is fully
|
||||||
|
backward compatible with standard tar tools like GNU tar, which treat the
|
||||||
Each tar member is compressed in its own lzip member, as well as the
|
resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
|
||||||
end-of-file blocks. This same method works for any tar format (gnu,
|
can append files to the end of such compressed archives.
|
||||||
ustar, posix) and is fully backward compatible with standard tar tools
|
|
||||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
|
||||||
any other tar.lz archive.
|
|
||||||
|
|
||||||
Tarlz can create tar archives with four levels of compression
|
Tarlz can create tar archives with four levels of compression
|
||||||
granularity; per file, per directory, appendable solid, and solid.
|
granularity; per file, per directory, appendable solid, and solid.
|
||||||
|
|
||||||
Tarlz is intended as a showcase project for the maintainers of real tar
|
|
||||||
programs to evaluate the format and perhaps implement it in their tools.
|
|
||||||
|
|
||||||
The diagram below shows the correspondence between tar members (formed
|
|
||||||
by a header plus optional data) in the tar archive and lzip members in
|
|
||||||
the resulting multimember tar.lz archive:
|
|
||||||
|
|
||||||
tar
|
|
||||||
+========+======+========+======+========+======+========+
|
|
||||||
| header | data | header | data | header | data | eof |
|
|
||||||
+========+======+========+======+========+======+========+
|
|
||||||
|
|
||||||
tar.lz
|
|
||||||
+===============+===============+===============+========+
|
|
||||||
| member | member | member | member |
|
|
||||||
+===============+===============+===============+========+
|
|
||||||
|
|
||||||
Of course, compressing each file (or each directory) individually is
|
Of course, compressing each file (or each directory) individually is
|
||||||
less efficient than compressing the whole tar archive, but it has the
|
less efficient than compressing the whole tar archive, but it has the
|
||||||
following advantages:
|
following advantages:
|
||||||
|
@ -38,19 +18,56 @@ following advantages:
|
||||||
* The resulting multimember tar.lz archive can be decompressed in
|
* The resulting multimember tar.lz archive can be decompressed in
|
||||||
parallel with plzip, multiplying the decompression speed.
|
parallel with plzip, multiplying the decompression speed.
|
||||||
|
|
||||||
* New members can be appended to the archive (by removing the eof
|
* New members can be appended to the archive (by removing the EOF
|
||||||
member) just like to an uncompressed tar archive.
|
member) just like to an uncompressed tar archive.
|
||||||
|
|
||||||
* It is a safe posix-style backup format. In case of corruption,
|
* It is a safe posix-style backup format. In case of corruption,
|
||||||
tarlz can extract all the undamaged members from the tar.lz
|
tarlz can extract all the undamaged members from the tar.lz
|
||||||
archive, skipping over the damaged members, just like the standard
|
archive, skipping over the damaged members, just like the standard
|
||||||
(uncompressed) tar. Moreover, lziprecover can be used to recover at
|
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||||
least part of the contents of the damaged members.
|
used to recover as much data as possible from each damaged member,
|
||||||
|
and lziprecover can be used to recover some of the damaged members.
|
||||||
|
|
||||||
* A multimember tar.lz archive is usually smaller than the
|
* A multimember tar.lz archive is usually smaller than the
|
||||||
corresponding solidly compressed tar.gz archive, except when
|
corresponding solidly compressed tar.gz archive, except when
|
||||||
individually compressing files smaller than about 32 KiB.
|
individually compressing files smaller than about 32 KiB.
|
||||||
|
|
||||||
|
Note that the posix pax format has a serious flaw. The metadata stored
|
||||||
|
in pax extended records are not protected by any kind of check sequence.
|
||||||
|
Corruption in a long filename may cause the extraction of the file in the
|
||||||
|
wrong place without warning. Corruption in a long file size may cause the
|
||||||
|
truncation of the file or the appending of garbage to the file, both
|
||||||
|
followed by a spurious warning about a corrupt header far from the place
|
||||||
|
of the undetected corruption.
|
||||||
|
|
||||||
|
Metadata like filename and file size must be always protected in an archive
|
||||||
|
format because of the adverse effects of undetected corruption in them,
|
||||||
|
potentially much worse that undetected corruption in the data. Even more so
|
||||||
|
in the case of pax because the amount of metadata it stores is potentially
|
||||||
|
large, making undetected corruption more probable.
|
||||||
|
|
||||||
|
Because of the above, tarlz protects the extended records with a CRC in
|
||||||
|
a way compatible with standard tar tools.
|
||||||
|
|
||||||
|
Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
|
||||||
|
|
||||||
|
Tarlz is intended as a showcase project for the maintainers of real tar
|
||||||
|
programs to evaluate the format and perhaps implement it in their tools.
|
||||||
|
|
||||||
|
The diagram below shows the correspondence between each tar member
|
||||||
|
(formed by one or two headers plus optional data) in the tar archive and
|
||||||
|
each lzip member in the resulting multimember tar.lz archive:
|
||||||
|
|
||||||
|
tar
|
||||||
|
+========+======+=================+===============+========+======+========+
|
||||||
|
| header | data | extended header | extended data | header | data | EOF |
|
||||||
|
+========+======+=================+===============+========+======+========+
|
||||||
|
|
||||||
|
tar.lz
|
||||||
|
+===============+=================================================+========+
|
||||||
|
| member | member | member |
|
||||||
|
+===============+=================================================+========+
|
||||||
|
|
||||||
|
|
||||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||||
|
|
||||||
|
|
12
configure
vendored
12
configure
vendored
|
@ -6,7 +6,7 @@
|
||||||
# to copy, distribute and modify it.
|
# to copy, distribute and modify it.
|
||||||
|
|
||||||
pkgname=tarlz
|
pkgname=tarlz
|
||||||
pkgversion=0.4
|
pkgversion=0.8
|
||||||
progname=tarlz
|
progname=tarlz
|
||||||
srctrigger=doc/${pkgname}.texi
|
srctrigger=doc/${pkgname}.texi
|
||||||
|
|
||||||
|
@ -70,6 +70,7 @@ while [ $# != 0 ] ; do
|
||||||
echo " CXX=COMPILER C++ compiler to use [${CXX}]"
|
echo " CXX=COMPILER C++ compiler to use [${CXX}]"
|
||||||
echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]"
|
echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]"
|
||||||
echo " CXXFLAGS=OPTIONS command line options for the C++ compiler [${CXXFLAGS}]"
|
echo " CXXFLAGS=OPTIONS command line options for the C++ compiler [${CXXFLAGS}]"
|
||||||
|
echo " CXXFLAGS+=OPTIONS append options to the current value of CXXFLAGS"
|
||||||
echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]"
|
echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]"
|
||||||
echo
|
echo
|
||||||
exit 0 ;;
|
exit 0 ;;
|
||||||
|
@ -93,10 +94,11 @@ while [ $# != 0 ] ; do
|
||||||
--mandir=*) mandir=${optarg} ;;
|
--mandir=*) mandir=${optarg} ;;
|
||||||
--no-create) no_create=yes ;;
|
--no-create) no_create=yes ;;
|
||||||
|
|
||||||
CXX=*) CXX=${optarg} ;;
|
CXX=*) CXX=${optarg} ;;
|
||||||
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
|
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
|
||||||
CXXFLAGS=*) CXXFLAGS=${optarg} ;;
|
CXXFLAGS=*) CXXFLAGS=${optarg} ;;
|
||||||
LDFLAGS=*) LDFLAGS=${optarg} ;;
|
CXXFLAGS+=*) CXXFLAGS="${CXXFLAGS} ${optarg}" ;;
|
||||||
|
LDFLAGS=*) LDFLAGS=${optarg} ;;
|
||||||
|
|
||||||
--*)
|
--*)
|
||||||
echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;;
|
echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;;
|
||||||
|
|
306
create.cc
306
create.cc
|
@ -28,6 +28,10 @@
|
||||||
#include <stdint.h>
|
#include <stdint.h>
|
||||||
#include <unistd.h>
|
#include <unistd.h>
|
||||||
#include <sys/stat.h>
|
#include <sys/stat.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#if defined(__GNU_LIBRARY__)
|
||||||
|
#include <sys/sysmacros.h> // for major, minor
|
||||||
|
#endif
|
||||||
#include <ftw.h>
|
#include <ftw.h>
|
||||||
#include <grp.h>
|
#include <grp.h>
|
||||||
#include <pwd.h>
|
#include <pwd.h>
|
||||||
|
@ -37,6 +41,9 @@
|
||||||
#include "lzip.h"
|
#include "lzip.h"
|
||||||
#include "tarlz.h"
|
#include "tarlz.h"
|
||||||
|
|
||||||
|
|
||||||
|
const CRC32C crc32c;
|
||||||
|
|
||||||
int cl_owner = -1; // global vars needed by add_member
|
int cl_owner = -1; // global vars needed by add_member
|
||||||
int cl_group = -1;
|
int cl_group = -1;
|
||||||
int cl_solid = 0; // 1 = dsolid, 2 = asolid, 3 = solid
|
int cl_solid = 0; // 1 = dsolid, 2 = asolid, 3 = solid
|
||||||
|
@ -44,6 +51,7 @@ int cl_solid = 0; // 1 = dsolid, 2 = asolid, 3 = solid
|
||||||
namespace {
|
namespace {
|
||||||
|
|
||||||
LZ_Encoder * encoder = 0; // local vars needed by add_member
|
LZ_Encoder * encoder = 0; // local vars needed by add_member
|
||||||
|
const char * archive_namep = 0;
|
||||||
int outfd = -1;
|
int outfd = -1;
|
||||||
int gretval = 0;
|
int gretval = 0;
|
||||||
|
|
||||||
|
@ -55,31 +63,67 @@ int seek_read( const int fd, uint8_t * const buf, const int size,
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check archive type, remove EOF blocks, and leave outfd file pos at EOF
|
// infd and outfd can refer to the same file if copying to a lower file
|
||||||
bool check_appendable()
|
// position or if source and destination blocks don't overlap.
|
||||||
|
// max_size < 0 means no size limit.
|
||||||
|
bool copy_file( const int infd, const int outfd, const long long max_size = -1 )
|
||||||
|
{
|
||||||
|
const int buffer_size = 65536;
|
||||||
|
// remaining number of bytes to copy
|
||||||
|
long long rest = ( ( max_size >= 0 ) ? max_size : buffer_size );
|
||||||
|
long long copied_size = 0;
|
||||||
|
uint8_t * const buffer = new uint8_t[buffer_size];
|
||||||
|
bool error = false;
|
||||||
|
|
||||||
|
while( rest > 0 )
|
||||||
|
{
|
||||||
|
const int size = std::min( (long long)buffer_size, rest );
|
||||||
|
if( max_size >= 0 ) rest -= size;
|
||||||
|
const int rd = readblock( infd, buffer, size );
|
||||||
|
if( rd != size && errno )
|
||||||
|
{ show_error( "Error reading input file", errno ); error = true; break; }
|
||||||
|
if( rd > 0 )
|
||||||
|
{
|
||||||
|
const int wr = writeblock( outfd, buffer, rd );
|
||||||
|
if( wr != rd )
|
||||||
|
{ show_error( "Error writing output file", errno );
|
||||||
|
error = true; break; }
|
||||||
|
copied_size += rd;
|
||||||
|
}
|
||||||
|
if( rd < size ) break; // EOF
|
||||||
|
}
|
||||||
|
delete[] buffer;
|
||||||
|
return ( !error && ( max_size < 0 || copied_size == max_size ) );
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Check archive type. If success, leave fd file pos at 0.
|
||||||
|
If remove_eof, leave fd file pos at beginning of the EOF blocks. */
|
||||||
|
bool check_appendable( const int fd, const bool remove_eof )
|
||||||
{
|
{
|
||||||
struct stat st;
|
struct stat st;
|
||||||
if( fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) ) return false;
|
if( fstat( fd, &st ) != 0 || !S_ISREG( st.st_mode ) ) return false;
|
||||||
uint8_t buf[header_size];
|
if( lseek( fd, 0, SEEK_SET ) != 0 ) return false;
|
||||||
int rd = readblock( outfd, buf, header_size );
|
enum { bufsize = header_size + ( header_size / 8 ) };
|
||||||
|
uint8_t buf[bufsize];
|
||||||
|
int rd = readblock( fd, buf, bufsize );
|
||||||
if( rd == 0 && errno == 0 ) return true; // append to empty archive
|
if( rd == 0 && errno == 0 ) return true; // append to empty archive
|
||||||
if( rd < min_member_size || ( rd != header_size && errno ) ) return false;
|
if( rd < min_member_size || ( rd != bufsize && errno ) ) return false;
|
||||||
const Lzip_header * const p = (Lzip_header *)buf; // shut up gcc
|
const Lzip_header * const p = (const Lzip_header *)buf; // shut up gcc
|
||||||
if( !p->verify_magic() ) return false;
|
if( !p->verify_magic() ) return false;
|
||||||
LZ_Decoder * decoder = LZ_decompress_open(); // decompress first header
|
LZ_Decoder * decoder = LZ_decompress_open(); // decompress first header
|
||||||
if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ||
|
if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ||
|
||||||
LZ_decompress_write( decoder, buf, rd ) != rd ||
|
LZ_decompress_write( decoder, buf, rd ) != rd ||
|
||||||
( rd = LZ_decompress_read( decoder, buf, header_size ) ) <
|
( rd = LZ_decompress_read( decoder, buf, header_size ) ) != header_size )
|
||||||
magic_o + magic_l )
|
|
||||||
{ LZ_decompress_close( decoder ); return false; }
|
{ LZ_decompress_close( decoder ); return false; }
|
||||||
LZ_decompress_close( decoder );
|
LZ_decompress_close( decoder );
|
||||||
const bool maybe_eof = ( buf[0] == 0 );
|
const bool maybe_eof = ( buf[0] == 0 );
|
||||||
if( !verify_ustar_chksum( buf ) && !maybe_eof ) return false;
|
if( !verify_ustar_chksum( buf ) && !maybe_eof ) return false;
|
||||||
const long long end = lseek( outfd, 0, SEEK_END );
|
const long long end = lseek( fd, 0, SEEK_END );
|
||||||
if( end < min_member_size ) return false;
|
if( end < min_member_size ) return false;
|
||||||
|
|
||||||
Lzip_trailer trailer;
|
Lzip_trailer trailer;
|
||||||
if( seek_read( outfd, trailer.data, Lzip_trailer::size,
|
if( seek_read( fd, trailer.data, Lzip_trailer::size,
|
||||||
end - Lzip_trailer::size ) != Lzip_trailer::size )
|
end - Lzip_trailer::size ) != Lzip_trailer::size )
|
||||||
return false;
|
return false;
|
||||||
const long long member_size = trailer.member_size();
|
const long long member_size = trailer.member_size();
|
||||||
|
@ -87,9 +131,8 @@ bool check_appendable()
|
||||||
( maybe_eof && member_size != end ) ) return false;
|
( maybe_eof && member_size != end ) ) return false;
|
||||||
|
|
||||||
Lzip_header header;
|
Lzip_header header;
|
||||||
if( seek_read( outfd, header.data, Lzip_header::size,
|
if( seek_read( fd, header.data, Lzip_header::size,
|
||||||
end - member_size ) != Lzip_header::size )
|
end - member_size ) != Lzip_header::size ) return false;
|
||||||
return false;
|
|
||||||
if( !header.verify_magic() || !isvalid_ds( header.dictionary_size() ) )
|
if( !header.verify_magic() || !isvalid_ds( header.dictionary_size() ) )
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
|
@ -102,12 +145,33 @@ bool check_appendable()
|
||||||
crc ^= 0xFFFFFFFFU;
|
crc ^= 0xFFFFFFFFU;
|
||||||
if( crc != data_crc ) return false;
|
if( crc != data_crc ) return false;
|
||||||
|
|
||||||
if( lseek( outfd, end - member_size, SEEK_SET ) != end - member_size ||
|
const long long pos = remove_eof ? end - member_size : 0;
|
||||||
ftruncate( outfd, end - member_size ) != 0 ) return false;
|
return ( lseek( fd, pos, SEEK_SET ) == pos );
|
||||||
return true;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class File_is_archive
|
||||||
|
{
|
||||||
|
dev_t archive_dev;
|
||||||
|
ino_t archive_ino;
|
||||||
|
bool initialized;
|
||||||
|
public:
|
||||||
|
File_is_archive() : initialized( false ) {}
|
||||||
|
bool init()
|
||||||
|
{
|
||||||
|
struct stat st;
|
||||||
|
if( fstat( outfd, &st ) != 0 ) return false;
|
||||||
|
if( S_ISREG( st.st_mode ) )
|
||||||
|
{ archive_dev = st.st_dev; archive_ino = st.st_ino; initialized = true; }
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
bool operator()( const struct stat & st ) const
|
||||||
|
{
|
||||||
|
return initialized && archive_dev == st.st_dev && archive_ino == st.st_ino;
|
||||||
|
}
|
||||||
|
} file_is_archive;
|
||||||
|
|
||||||
|
|
||||||
bool archive_write( const uint8_t * const buf, const int size )
|
bool archive_write( const uint8_t * const buf, const int size )
|
||||||
{
|
{
|
||||||
if( !encoder ) // uncompressed
|
if( !encoder ) // uncompressed
|
||||||
|
@ -121,9 +185,10 @@ bool archive_write( const uint8_t * const buf, const int size )
|
||||||
const int wr = LZ_compress_write( encoder, buf + sz, size - sz );
|
const int wr = LZ_compress_write( encoder, buf + sz, size - sz );
|
||||||
if( wr < 0 ) internal_error( "library error (LZ_compress_write)." );
|
if( wr < 0 ) internal_error( "library error (LZ_compress_write)." );
|
||||||
sz += wr;
|
sz += wr;
|
||||||
|
if( sz >= size && size > 0 ) break; // minimize dictionary size
|
||||||
const int rd = LZ_compress_read( encoder, obuf, obuf_size );
|
const int rd = LZ_compress_read( encoder, obuf, obuf_size );
|
||||||
if( rd < 0 ) internal_error( "library error (LZ_compress_read)." );
|
if( rd < 0 ) internal_error( "library error (LZ_compress_read)." );
|
||||||
if( rd == 0 && sz == size ) break;
|
if( rd == 0 && sz >= size ) break;
|
||||||
if( writeblock( outfd, obuf, rd ) != rd ) return false;
|
if( writeblock( outfd, obuf, rd ) != rd ) return false;
|
||||||
}
|
}
|
||||||
if( LZ_compress_finished( encoder ) == 1 &&
|
if( LZ_compress_finished( encoder ) == 1 &&
|
||||||
|
@ -133,11 +198,98 @@ bool archive_write( const uint8_t * const buf, const int size )
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void init_tar_header( Tar_header header ) // set magic and version
|
||||||
|
{
|
||||||
|
std::memset( header, 0, header_size );
|
||||||
|
std::memcpy( header + magic_o, ustar_magic, magic_l - 1 );
|
||||||
|
header[version_o] = header[version_o+1] = '0';
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
unsigned char xdigit( const unsigned value )
|
||||||
|
{
|
||||||
|
if( value <= 9 ) return '0' + value;
|
||||||
|
if( value <= 15 ) return 'A' + value - 10;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void print_hex( char * const buf, int size, unsigned long long num )
|
||||||
|
{
|
||||||
|
while( --size >= 0 ) { buf[size] = xdigit( num & 0x0F ); num >>= 4; }
|
||||||
|
}
|
||||||
|
|
||||||
void print_octal( char * const buf, int size, unsigned long long num )
|
void print_octal( char * const buf, int size, unsigned long long num )
|
||||||
{
|
{
|
||||||
while( --size >= 0 ) { buf[size] = '0' + ( num % 8 ); num /= 8; }
|
while( --size >= 0 ) { buf[size] = '0' + ( num % 8 ); num /= 8; }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
unsigned decimal_digits( unsigned long long value )
|
||||||
|
{
|
||||||
|
unsigned digits = 1;
|
||||||
|
while( value >= 10 ) { value /= 10; ++digits; }
|
||||||
|
return digits;
|
||||||
|
}
|
||||||
|
|
||||||
|
unsigned long long record_size( const unsigned keyword_size,
|
||||||
|
const unsigned long long value_size )
|
||||||
|
{
|
||||||
|
// size = ' ' + keyword + '=' + value + '\n'
|
||||||
|
const unsigned long long size = 1 + keyword_size + 1 + value_size + 1;
|
||||||
|
const unsigned d1 = decimal_digits( size );
|
||||||
|
return decimal_digits( d1 + size ) + size;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool write_extended( const Extended & extended )
|
||||||
|
{
|
||||||
|
const int path_rec = extended.path.size() ?
|
||||||
|
record_size( 4, extended.path.size() ) : 0;
|
||||||
|
const int lpath_rec = extended.linkpath.size() ?
|
||||||
|
record_size( 8, extended.linkpath.size() ) : 0;
|
||||||
|
const int size_rec = ( extended.size > 0 ) ?
|
||||||
|
record_size( 4, decimal_digits( extended.size ) ) : 0;
|
||||||
|
const unsigned long long edsize = path_rec + lpath_rec + size_rec + 22;
|
||||||
|
const unsigned long long bufsize = round_up( edsize );
|
||||||
|
if( edsize >= 1ULL << 33 ) return false; // too much extended data
|
||||||
|
if( bufsize == 0 ) return edsize == 0; // overflow or no extended data
|
||||||
|
char * const buf = new char[bufsize+1]; // extended records buffer
|
||||||
|
unsigned long long pos = path_rec; // goto can't cross this
|
||||||
|
if( path_rec && snprintf( buf, path_rec + 1, "%d path=%s\n",
|
||||||
|
path_rec, extended.path.c_str() ) != path_rec )
|
||||||
|
goto error;
|
||||||
|
if( lpath_rec && snprintf( buf + pos, lpath_rec + 1, "%d linkpath=%s\n",
|
||||||
|
lpath_rec, extended.linkpath.c_str() ) != lpath_rec )
|
||||||
|
goto error;
|
||||||
|
pos += lpath_rec;
|
||||||
|
if( size_rec && snprintf( buf + pos, size_rec + 1, "%d size=%llu\n",
|
||||||
|
size_rec, extended.size ) != size_rec )
|
||||||
|
goto error;
|
||||||
|
pos += size_rec;
|
||||||
|
if( snprintf( buf + pos, 23, "22 GNU.crc32=00000000\n" ) != 22 ) goto error;
|
||||||
|
pos += 22;
|
||||||
|
if( pos != edsize ) goto error;
|
||||||
|
print_hex( buf + edsize - 9, 8,
|
||||||
|
crc32c.windowed_crc( (const uint8_t *)buf, edsize - 9, edsize ) );
|
||||||
|
std::memset( buf + edsize, 0, bufsize - edsize ); // wipe padding
|
||||||
|
Tar_header header; // extended header
|
||||||
|
init_tar_header( header );
|
||||||
|
header[typeflag_o] = tf_extended; // fill only required fields
|
||||||
|
print_octal( header + size_o, size_l - 1, edsize );
|
||||||
|
print_octal( header + chksum_o, chksum_l - 1,
|
||||||
|
ustar_chksum( (const uint8_t *)header ) );
|
||||||
|
if( !archive_write( (const uint8_t *)header, header_size ) ) goto error;
|
||||||
|
for( pos = 0; pos < bufsize; ) // write extended records to archive
|
||||||
|
{
|
||||||
|
int size = std::min( bufsize - pos, 1ULL << 20 );
|
||||||
|
if( !archive_write( (const uint8_t *)buf + pos, size ) ) goto error;
|
||||||
|
pos += size;
|
||||||
|
}
|
||||||
|
delete[] buf;
|
||||||
|
return true;
|
||||||
|
error:
|
||||||
|
delete[] buf;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
const char * remove_leading_dotdot( const char * const filename )
|
const char * remove_leading_dotdot( const char * const filename )
|
||||||
{
|
{
|
||||||
|
@ -164,24 +316,31 @@ const char * remove_leading_dotdot( const char * const filename )
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
bool split_name( const char * const filename, Tar_header header )
|
// Return true if filename fits in the ustar header.
|
||||||
|
bool store_name( const char * const filename, Extended & extended,
|
||||||
|
Tar_header header )
|
||||||
{
|
{
|
||||||
const char * const stored_name = remove_leading_dotdot( filename );
|
const char * const stored_name = remove_leading_dotdot( filename );
|
||||||
const int len = std::strlen( stored_name );
|
const int len = std::strlen( stored_name );
|
||||||
enum { max_len = prefix_l + 1 + name_l }; // prefix + '/' + name
|
enum { max_len = prefix_l + 1 + name_l }; // prefix + '/' + name
|
||||||
|
|
||||||
|
// first try storing filename in the ustar header
|
||||||
if( len <= name_l ) // stored_name fits in name
|
if( len <= name_l ) // stored_name fits in name
|
||||||
{ std::memcpy( header + name_o, stored_name, len ); return true; }
|
{ std::memcpy( header + name_o, stored_name, len ); return true; }
|
||||||
if( len <= max_len ) // find shortest prefix
|
if( len <= max_len ) // find shortest prefix
|
||||||
for( int i = len - name_l - 1; i < len && i <= prefix_l; ++i )
|
for( int i = len - name_l - 1; i < len && i <= prefix_l; ++i )
|
||||||
if( stored_name[i] == '/' )
|
if( stored_name[i] == '/' ) // stored_name can be split
|
||||||
{
|
{
|
||||||
std::memcpy( header + name_o, stored_name + i + 1, len - i - 1 );
|
std::memcpy( header + name_o, stored_name + i + 1, len - i - 1 );
|
||||||
std::memcpy( header + prefix_o, stored_name, i );
|
std::memcpy( header + prefix_o, stored_name, i );
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
// store filename in extended record, leave name zeroed in ustar header
|
||||||
|
extended.path = stored_name;
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
int add_member( const char * const filename, const struct stat *,
|
int add_member( const char * const filename, const struct stat *,
|
||||||
const int flag, struct FTW * )
|
const int flag, struct FTW * )
|
||||||
{
|
{
|
||||||
|
@ -189,11 +348,13 @@ int add_member( const char * const filename, const struct stat *,
|
||||||
if( lstat( filename, &st ) != 0 )
|
if( lstat( filename, &st ) != 0 )
|
||||||
{ show_file_error( filename, "Can't stat input file", errno );
|
{ show_file_error( filename, "Can't stat input file", errno );
|
||||||
gretval = 1; return 0; }
|
gretval = 1; return 0; }
|
||||||
|
if( file_is_archive( st ) )
|
||||||
|
{ show_file_error( archive_namep, "File is the archive; not dumped." );
|
||||||
|
return 0; }
|
||||||
|
Extended extended; // metadata for extended records
|
||||||
Tar_header header;
|
Tar_header header;
|
||||||
std::memset( header, 0, header_size );
|
init_tar_header( header );
|
||||||
if( !split_name( filename, header ) )
|
store_name( filename, extended, header );
|
||||||
{ show_file_error( filename, "File name is too long." );
|
|
||||||
gretval = 2; return 0; }
|
|
||||||
|
|
||||||
const mode_t mode = st.st_mode;
|
const mode_t mode = st.st_mode;
|
||||||
print_octal( header + mode_o, mode_l - 1,
|
print_octal( header + mode_o, mode_l - 1,
|
||||||
|
@ -201,10 +362,17 @@ int add_member( const char * const filename, const struct stat *,
|
||||||
S_IRWXU | S_IRWXG | S_IRWXO ) );
|
S_IRWXU | S_IRWXG | S_IRWXO ) );
|
||||||
const uid_t uid = ( cl_owner >= 0 ) ? (uid_t)cl_owner : st.st_uid;
|
const uid_t uid = ( cl_owner >= 0 ) ? (uid_t)cl_owner : st.st_uid;
|
||||||
const gid_t gid = ( cl_group >= 0 ) ? (gid_t)cl_group : st.st_gid;
|
const gid_t gid = ( cl_group >= 0 ) ? (gid_t)cl_group : st.st_gid;
|
||||||
|
if( uid >= 2 << 20 || gid >= 2 << 20 )
|
||||||
|
{ show_file_error( filename, "uid or gid is larger than 2_097_151." );
|
||||||
|
gretval = 1; return 0; }
|
||||||
print_octal( header + uid_o, uid_l - 1, uid );
|
print_octal( header + uid_o, uid_l - 1, uid );
|
||||||
print_octal( header + gid_o, gid_l - 1, gid );
|
print_octal( header + gid_o, gid_l - 1, gid );
|
||||||
|
const long long mtime = st.st_mtime; // shut up gcc
|
||||||
|
if( mtime < 0 || mtime >= 1LL << 33 )
|
||||||
|
{ show_file_error( filename, "mtime is out of ustar range [0, 8_589_934_591]." );
|
||||||
|
gretval = 1; return 0; }
|
||||||
|
print_octal( header + mtime_o, mtime_l - 1, mtime );
|
||||||
unsigned long long file_size = 0;
|
unsigned long long file_size = 0;
|
||||||
print_octal( header + mtime_o, mtime_l - 1, st.st_mtime );
|
|
||||||
Typeflag typeflag;
|
Typeflag typeflag;
|
||||||
if( S_ISREG( mode ) ) { typeflag = tf_regular; file_size = st.st_size; }
|
if( S_ISREG( mode ) ) { typeflag = tf_regular; file_size = st.st_size; }
|
||||||
else if( S_ISDIR( mode ) )
|
else if( S_ISDIR( mode ) )
|
||||||
|
@ -217,16 +385,26 @@ int add_member( const char * const filename, const struct stat *,
|
||||||
else if( S_ISLNK( mode ) )
|
else if( S_ISLNK( mode ) )
|
||||||
{
|
{
|
||||||
typeflag = tf_symlink;
|
typeflag = tf_symlink;
|
||||||
if( st.st_size > linkname_l ||
|
long len;
|
||||||
readlink( filename, header + linkname_o, linkname_l ) != st.st_size )
|
if( st.st_size <= linkname_l )
|
||||||
|
len = readlink( filename, header + linkname_o, linkname_l );
|
||||||
|
else
|
||||||
{
|
{
|
||||||
show_file_error( filename, "Link destination name is too long." );
|
char * const buf = new char[st.st_size+1];
|
||||||
gretval = 2; return 0;
|
len = readlink( filename, buf, st.st_size );
|
||||||
|
if( len == st.st_size ) { buf[len] = 0; extended.linkpath = buf; }
|
||||||
|
delete[] buf;
|
||||||
}
|
}
|
||||||
|
if( len != st.st_size )
|
||||||
|
{ show_file_error( filename, "Error reading link", (len < 0) ? errno : 0 );
|
||||||
|
gretval = 1; return 0; }
|
||||||
}
|
}
|
||||||
else if( S_ISCHR( mode ) || S_ISBLK( mode ) )
|
else if( S_ISCHR( mode ) || S_ISBLK( mode ) )
|
||||||
{
|
{
|
||||||
typeflag = S_ISCHR( mode ) ? tf_chardev : tf_blockdev;
|
typeflag = S_ISCHR( mode ) ? tf_chardev : tf_blockdev;
|
||||||
|
if( major( st.st_dev ) >= 2 << 20 || minor( st.st_dev ) >= 2 << 20 )
|
||||||
|
{ show_file_error( filename, "devmajor or devminor is larger than 2_097_151." );
|
||||||
|
gretval = 1; return 0; }
|
||||||
print_octal( header + devmajor_o, devmajor_l - 1, major( st.st_dev ) );
|
print_octal( header + devmajor_o, devmajor_l - 1, major( st.st_dev ) );
|
||||||
print_octal( header + devminor_o, devminor_l - 1, minor( st.st_dev ) );
|
print_octal( header + devminor_o, devminor_l - 1, minor( st.st_dev ) );
|
||||||
}
|
}
|
||||||
|
@ -234,22 +412,23 @@ int add_member( const char * const filename, const struct stat *,
|
||||||
else { show_file_error( filename, "Unknown file type." );
|
else { show_file_error( filename, "Unknown file type." );
|
||||||
gretval = 2; return 0; }
|
gretval = 2; return 0; }
|
||||||
header[typeflag_o] = typeflag;
|
header[typeflag_o] = typeflag;
|
||||||
std::memcpy( header + magic_o, ustar_magic, magic_l - 1 );
|
|
||||||
header[version_o] = header[version_o+1] = '0';
|
|
||||||
const struct passwd * const pw = getpwuid( uid );
|
const struct passwd * const pw = getpwuid( uid );
|
||||||
if( pw && pw->pw_name )
|
if( pw && pw->pw_name )
|
||||||
std::strncpy( header + uname_o, pw->pw_name, uname_l - 1 );
|
std::strncpy( header + uname_o, pw->pw_name, uname_l - 1 );
|
||||||
const struct group * const gr = getgrgid( gid );
|
const struct group * const gr = getgrgid( gid );
|
||||||
if( gr && gr->gr_name )
|
if( gr && gr->gr_name )
|
||||||
std::strncpy( header + gname_o, gr->gr_name, gname_l - 1 );
|
std::strncpy( header + gname_o, gr->gr_name, gname_l - 1 );
|
||||||
print_octal( header + size_o, size_l - 1, file_size );
|
if( file_size >= 1ULL << 33 ) extended.size = file_size;
|
||||||
|
else print_octal( header + size_o, size_l - 1, file_size );
|
||||||
print_octal( header + chksum_o, chksum_l - 1,
|
print_octal( header + chksum_o, chksum_l - 1,
|
||||||
ustar_chksum( (const uint8_t *)header ) );
|
ustar_chksum( (const uint8_t *)header ) );
|
||||||
|
|
||||||
const int infd = file_size ? open_instream( filename ) : -1;
|
const int infd = file_size ? open_instream( filename ) : -1;
|
||||||
if( file_size && infd < 0 ) { gretval = 1; return 0; }
|
if( file_size && infd < 0 ) { gretval = 1; return 0; }
|
||||||
|
if( !extended.empty() && !write_extended( extended ) )
|
||||||
|
{ show_error( "Error writing extended header", errno ); return 1; }
|
||||||
if( !archive_write( (const uint8_t *)header, header_size ) )
|
if( !archive_write( (const uint8_t *)header, header_size ) )
|
||||||
{ show_error( "Error writing archive header", errno ); return 1; }
|
{ show_error( "Error writing ustar header", errno ); return 1; }
|
||||||
if( file_size )
|
if( file_size )
|
||||||
{
|
{
|
||||||
enum { bufsize = 32 * header_size };
|
enum { bufsize = 32 * header_size };
|
||||||
|
@ -304,6 +483,49 @@ bool verify_ustar_chksum( const uint8_t * const buf )
|
||||||
ustar_chksum( buf ) == strtoul( (const char *)buf + chksum_o, 0, 8 ) ); }
|
ustar_chksum( buf ) == strtoul( (const char *)buf + chksum_o, 0, 8 ) ); }
|
||||||
|
|
||||||
|
|
||||||
|
int concatenate( const std::string & archive_name, const Arg_parser & parser,
|
||||||
|
const int filenames )
|
||||||
|
{
|
||||||
|
if( !filenames )
|
||||||
|
{ if( verbosity >= 1 ) show_error( "Nothing to concatenate." ); return 0; }
|
||||||
|
if( archive_name.empty() )
|
||||||
|
{ show_error( "'--concatenate' is incompatible with '-f -'.", 0, true );
|
||||||
|
return 1; }
|
||||||
|
if( ( outfd = open_outstream( archive_name, false ) ) < 0 ) return 1;
|
||||||
|
if( !file_is_archive.init() )
|
||||||
|
{ show_file_error( archive_name.c_str(), "Can't stat", errno ); return 1; }
|
||||||
|
|
||||||
|
int retval = 0;
|
||||||
|
for( int i = 0; i < parser.arguments(); ++i ) // copy archives
|
||||||
|
{
|
||||||
|
if( parser.code( i ) ) continue; // skip options
|
||||||
|
const char * const filename = parser.argument( i ).c_str();
|
||||||
|
const int infd = open_instream( filename );
|
||||||
|
if( infd < 0 )
|
||||||
|
{ show_file_error( filename, "Can't open input file", errno );
|
||||||
|
retval = 1; break; }
|
||||||
|
if( !check_appendable( infd, false ) )
|
||||||
|
{ show_file_error( filename, "Not an appendable tar.lz archive." );
|
||||||
|
close( infd ); retval = 2; break; }
|
||||||
|
struct stat st;
|
||||||
|
if( fstat( infd, &st ) == 0 && file_is_archive( st ) )
|
||||||
|
{ show_file_error( filename, "File is the archive; not concatenated." );
|
||||||
|
close( infd ); continue; }
|
||||||
|
if( !check_appendable( outfd, true ) )
|
||||||
|
{ show_error( "This does not look like an appendable tar.lz archive." );
|
||||||
|
close( infd ); retval = 2; break; }
|
||||||
|
if( !copy_file( infd, outfd ) || close( infd ) != 0 )
|
||||||
|
{ show_file_error( filename, "Error copying archive", errno );
|
||||||
|
retval = 1; break; }
|
||||||
|
if( verbosity >= 1 ) std::fprintf( stderr, "%s\n", filename );
|
||||||
|
}
|
||||||
|
|
||||||
|
if( close( outfd ) != 0 && !retval )
|
||||||
|
{ show_error( "Error closing archive", errno ); retval = 1; }
|
||||||
|
return retval;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
int encode( const std::string & archive_name, const Arg_parser & parser,
|
int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
const int filenames, const int level, const bool append )
|
const int filenames, const int level, const bool append )
|
||||||
{
|
{
|
||||||
|
@ -345,11 +567,15 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
{ show_error( "'--append' is incompatible with '--uncompressed'.", 0, true );
|
{ show_error( "'--append' is incompatible with '--uncompressed'.", 0, true );
|
||||||
return 1; }
|
return 1; }
|
||||||
if( ( outfd = open_outstream( archive_name, false ) ) < 0 ) return 1;
|
if( ( outfd = open_outstream( archive_name, false ) ) < 0 ) return 1;
|
||||||
if( !check_appendable() )
|
if( !check_appendable( outfd, true ) )
|
||||||
{ show_error( "This does not look like an appendable tar.lz archive." );
|
{ show_error( "This does not look like an appendable tar.lz archive." );
|
||||||
return 2; }
|
return 2; }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
archive_namep = archive_name.size() ? archive_name.c_str() : "(stdout)";
|
||||||
|
if( !file_is_archive.init() )
|
||||||
|
{ show_file_error( archive_namep, "Can't stat", errno ); return 1; }
|
||||||
|
|
||||||
if( compressed )
|
if( compressed )
|
||||||
{
|
{
|
||||||
encoder = LZ_compress_open( option_mapping[level].dictionary_size,
|
encoder = LZ_compress_open( option_mapping[level].dictionary_size,
|
||||||
|
@ -365,7 +591,6 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
}
|
}
|
||||||
|
|
||||||
int retval = 0;
|
int retval = 0;
|
||||||
std::string deslashed; // arg without trailing slashes
|
|
||||||
for( int i = 0; i < parser.arguments(); ++i ) // write members
|
for( int i = 0; i < parser.arguments(); ++i ) // write members
|
||||||
{
|
{
|
||||||
const int code = parser.code( i );
|
const int code = parser.code( i );
|
||||||
|
@ -375,6 +600,7 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
{ show_file_error( filename, "Error changing working directory", errno );
|
{ show_file_error( filename, "Error changing working directory", errno );
|
||||||
retval = 1; break; }
|
retval = 1; break; }
|
||||||
if( code ) continue; // skip options
|
if( code ) continue; // skip options
|
||||||
|
std::string deslashed; // arg without trailing slashes
|
||||||
unsigned len = arg.size();
|
unsigned len = arg.size();
|
||||||
while( len > 1 && arg[len-1] == '/' ) --len;
|
while( len > 1 && arg[len-1] == '/' ) --len;
|
||||||
if( len < arg.size() )
|
if( len < arg.size() )
|
||||||
|
@ -391,16 +617,18 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
|
|
||||||
if( !retval ) // write End-Of-Archive records
|
if( !retval ) // write End-Of-Archive records
|
||||||
{
|
{
|
||||||
uint8_t buf[header_size];
|
enum { bufsize = 2 * header_size };
|
||||||
std::memset( buf, 0, header_size );
|
uint8_t buf[bufsize];
|
||||||
|
std::memset( buf, 0, bufsize );
|
||||||
if( encoder && cl_solid == 2 && !archive_write( 0, 0 ) ) // flush encoder
|
if( encoder && cl_solid == 2 && !archive_write( 0, 0 ) ) // flush encoder
|
||||||
{ show_error( "Error flushing encoder", errno ); retval = 1; }
|
{ show_error( "Error flushing encoder", errno ); retval = 1; }
|
||||||
else if( !archive_write( buf, header_size ) ||
|
else if( !archive_write( buf, bufsize ) ||
|
||||||
!archive_write( buf, header_size ) ||
|
|
||||||
( encoder && !archive_write( 0, 0 ) ) ) // flush encoder
|
( encoder && !archive_write( 0, 0 ) ) ) // flush encoder
|
||||||
{ show_error( "Error writing end-of-archive blocks", errno );
|
{ show_error( "Error writing end-of-archive blocks", errno );
|
||||||
retval = 1; }
|
retval = 1; }
|
||||||
}
|
}
|
||||||
|
if( encoder && LZ_compress_close( encoder ) < 0 )
|
||||||
|
{ show_error( "LZ_compress_close failed." ); retval = 1; }
|
||||||
if( close( outfd ) != 0 && !retval )
|
if( close( outfd ) != 0 && !retval )
|
||||||
{ show_error( "Error closing archive", errno ); retval = 1; }
|
{ show_error( "Error closing archive", errno ); retval = 1; }
|
||||||
if( retval && archive_name.size() && !append )
|
if( retval && archive_name.size() && !append )
|
||||||
|
|
41
doc/tarlz.1
41
doc/tarlz.1
|
@ -1,12 +1,25 @@
|
||||||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||||
.TH TARLZ "1" "April 2018" "tarlz 0.4" "User Commands"
|
.TH TARLZ "1" "December 2018" "tarlz 0.8" "User Commands"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
tarlz \- creates tar archives with multimember lzip compression
|
tarlz \- creates tar archives with multimember lzip compression
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
.B tarlz
|
.B tarlz
|
||||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
Tarlz \- Archiver with multimember lzip compression.
|
Tarlz is a small and simple implementation of the tar archiver. By default
|
||||||
|
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||||
|
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||||
|
its own lzip member, as well as the end\-of\-file blocks. This method is fully
|
||||||
|
backward compatible with standard tar tools like GNU tar, which treat the
|
||||||
|
resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
|
||||||
|
can append files to the end of such compressed archives.
|
||||||
|
.PP
|
||||||
|
The tarlz file format is a safe posix\-style backup format. In case of
|
||||||
|
corruption, tarlz can extract all the undamaged members from the tar.lz
|
||||||
|
archive, skipping over the damaged members, just like the standard
|
||||||
|
(uncompressed) tar. Moreover, the option '\-\-keep\-damaged' can be used to
|
||||||
|
recover as much data as possible from each damaged member, and lziprecover
|
||||||
|
can be used to recover some of the damaged members.
|
||||||
.SH OPTIONS
|
.SH OPTIONS
|
||||||
.TP
|
.TP
|
||||||
\fB\-h\fR, \fB\-\-help\fR
|
\fB\-h\fR, \fB\-\-help\fR
|
||||||
|
@ -15,6 +28,9 @@ display this help and exit
|
||||||
\fB\-V\fR, \fB\-\-version\fR
|
\fB\-V\fR, \fB\-\-version\fR
|
||||||
output version information and exit
|
output version information and exit
|
||||||
.TP
|
.TP
|
||||||
|
\fB\-A\fR, \fB\-\-concatenate\fR
|
||||||
|
append tar.lz archives to the end of an archive
|
||||||
|
.TP
|
||||||
\fB\-c\fR, \fB\-\-create\fR
|
\fB\-c\fR, \fB\-\-create\fR
|
||||||
create a new archive
|
create a new archive
|
||||||
.TP
|
.TP
|
||||||
|
@ -48,17 +64,29 @@ create solidly compressed appendable archive
|
||||||
\fB\-\-dsolid\fR
|
\fB\-\-dsolid\fR
|
||||||
create per\-directory compressed archive
|
create per\-directory compressed archive
|
||||||
.TP
|
.TP
|
||||||
|
\fB\-\-no\-solid\fR
|
||||||
|
create per\-file compressed archive (default)
|
||||||
|
.TP
|
||||||
\fB\-\-solid\fR
|
\fB\-\-solid\fR
|
||||||
create solidly compressed archive
|
create solidly compressed archive
|
||||||
.TP
|
.TP
|
||||||
\fB\-\-group=\fR<group>
|
\fB\-\-anonymous\fR
|
||||||
use <group> name/id for added files
|
equivalent to '\-\-owner=root \fB\-\-group\fR=\fI\,root\/\fR'
|
||||||
.TP
|
.TP
|
||||||
\fB\-\-owner=\fR<owner>
|
\fB\-\-owner=\fR<owner>
|
||||||
use <owner> name/id for added files
|
use <owner> name/ID for files added
|
||||||
|
.TP
|
||||||
|
\fB\-\-group=\fR<group>
|
||||||
|
use <group> name/ID for files added
|
||||||
|
.TP
|
||||||
|
\fB\-\-keep\-damaged\fR
|
||||||
|
don't delete partially extracted files
|
||||||
|
.TP
|
||||||
|
\fB\-\-missing\-crc\fR
|
||||||
|
exit with error status if missing extended CRC
|
||||||
.TP
|
.TP
|
||||||
\fB\-\-uncompressed\fR
|
\fB\-\-uncompressed\fR
|
||||||
don't compress the created archive
|
don't compress the archive created
|
||||||
.PP
|
.PP
|
||||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||||
|
@ -70,6 +98,7 @@ Report bugs to lzip\-bug@nongnu.org
|
||||||
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
|
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
|
||||||
.SH COPYRIGHT
|
.SH COPYRIGHT
|
||||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||||
|
Using lzlib 1.11\-rc2
|
||||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||||
.br
|
.br
|
||||||
This is free software: you are free to change and redistribute it.
|
This is free software: you are free to change and redistribute it.
|
||||||
|
|
498
doc/tarlz.info
498
doc/tarlz.info
|
@ -11,15 +11,17 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
||||||
Tarlz Manual
|
Tarlz Manual
|
||||||
************
|
************
|
||||||
|
|
||||||
This manual is for Tarlz (version 0.4, 23 April 2018).
|
This manual is for Tarlz (version 0.8, 16 December 2018).
|
||||||
|
|
||||||
* Menu:
|
* Menu:
|
||||||
|
|
||||||
* Introduction:: Purpose and features of tarlz
|
* Introduction:: Purpose and features of tarlz
|
||||||
* Invoking tarlz:: Command line interface
|
* Invoking tarlz:: Command line interface
|
||||||
* Examples:: A small tutorial with examples
|
* File format:: Detailed format of the compressed archive
|
||||||
* Problems:: Reporting bugs
|
* Amendments to pax format:: The reasons for the differences with pax
|
||||||
* Concept index:: Index of concepts
|
* Examples:: A small tutorial with examples
|
||||||
|
* Problems:: Reporting bugs
|
||||||
|
* Concept index:: Index of concepts
|
||||||
|
|
||||||
|
|
||||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||||
|
@ -34,38 +36,17 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
||||||
**************
|
**************
|
||||||
|
|
||||||
Tarlz is a small and simple implementation of the tar archiver. By
|
Tarlz is a small and simple implementation of the tar archiver. By
|
||||||
default tarlz creates, lists and extracts archives in the 'ustar' format
|
default tarlz creates, lists and extracts archives in a simplified
|
||||||
compressed with lzip on a per file basis. Tarlz can append files to the
|
posix pax format compressed with lzip on a per file basis. Each tar
|
||||||
end of such compressed archives.
|
member is compressed in its own lzip member, as well as the end-of-file
|
||||||
|
blocks. This method is fully backward compatible with standard tar tools
|
||||||
Each tar member is compressed in its own lzip member, as well as the
|
|
||||||
end-of-file blocks. This same method works for any tar format (gnu,
|
|
||||||
ustar, posix) and is fully backward compatible with standard tar tools
|
|
||||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
like GNU tar, which treat the resulting multimember tar.lz archive like
|
||||||
any other tar.lz archive.
|
any other tar.lz archive. Tarlz can append files to the end of such
|
||||||
|
compressed archives.
|
||||||
|
|
||||||
Tarlz can create tar archives with four levels of compression
|
Tarlz can create tar archives with four levels of compression
|
||||||
granularity; per file, per directory, appendable solid, and solid.
|
granularity; per file, per directory, appendable solid, and solid.
|
||||||
|
|
||||||
Tarlz is intended as a showcase project for the maintainers of real
|
|
||||||
tar programs to evaluate the format and perhaps implement it in their
|
|
||||||
tools.
|
|
||||||
|
|
||||||
The diagram below shows the correspondence between tar members
|
|
||||||
(formed by a header plus optional data) in the tar archive and lzip
|
|
||||||
members in the resulting multimember tar.lz archive: *Note File format:
|
|
||||||
(lzip)File format.
|
|
||||||
|
|
||||||
tar
|
|
||||||
+========+======+========+======+========+======+========+
|
|
||||||
| header | data | header | data | header | data | eof |
|
|
||||||
+========+======+========+======+========+======+========+
|
|
||||||
|
|
||||||
tar.lz
|
|
||||||
+===============+===============+===============+========+
|
|
||||||
| member | member | member | member |
|
|
||||||
+===============+===============+===============+========+
|
|
||||||
|
|
||||||
Of course, compressing each file (or each directory) individually is
|
Of course, compressing each file (or each directory) individually is
|
||||||
less efficient than compressing the whole tar archive, but it has the
|
less efficient than compressing the whole tar archive, but it has the
|
||||||
following advantages:
|
following advantages:
|
||||||
|
@ -73,21 +54,32 @@ following advantages:
|
||||||
* The resulting multimember tar.lz archive can be decompressed in
|
* The resulting multimember tar.lz archive can be decompressed in
|
||||||
parallel with plzip, multiplying the decompression speed.
|
parallel with plzip, multiplying the decompression speed.
|
||||||
|
|
||||||
* New members can be appended to the archive (by removing the eof
|
* New members can be appended to the archive (by removing the EOF
|
||||||
member) just like to an uncompressed tar archive.
|
member) just like to an uncompressed tar archive.
|
||||||
|
|
||||||
* It is a safe posix-style backup format. In case of corruption,
|
* It is a safe posix-style backup format. In case of corruption,
|
||||||
tarlz can extract all the undamaged members from the tar.lz
|
tarlz can extract all the undamaged members from the tar.lz
|
||||||
archive, skipping over the damaged members, just like the standard
|
archive, skipping over the damaged members, just like the standard
|
||||||
(uncompressed) tar. Moreover, lziprecover can be used to recover at
|
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||||
least part of the contents of the damaged members.
|
used to recover as much data as possible from each damaged member,
|
||||||
|
and lziprecover can be used to recover some of the damaged members.
|
||||||
|
|
||||||
* A multimember tar.lz archive is usually smaller than the
|
* A multimember tar.lz archive is usually smaller than the
|
||||||
corresponding solidly compressed tar.gz archive, except when
|
corresponding solidly compressed tar.gz archive, except when
|
||||||
individually compressing files smaller than about 32 KiB.
|
individually compressing files smaller than about 32 KiB.
|
||||||
|
|
||||||
|
Tarlz protects the extended records with a CRC in a way compatible
|
||||||
|
with standard tar tools. *Note crc32::.
|
||||||
|
|
||||||
|
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
|
||||||
|
'star' or 'v7'.
|
||||||
|
|
||||||
|
Tarlz is intended as a showcase project for the maintainers of real
|
||||||
|
tar programs to evaluate the format and perhaps implement it in their
|
||||||
|
tools.
|
||||||
|
|
||||||
|
|
||||||
File: tarlz.info, Node: Invoking tarlz, Next: Examples, Prev: Introduction, Up: Top
|
File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
|
||||||
|
|
||||||
2 Invoking tarlz
|
2 Invoking tarlz
|
||||||
****************
|
****************
|
||||||
|
@ -97,9 +89,15 @@ The format for running tarlz is:
|
||||||
tarlz [OPTIONS] [FILES]
|
tarlz [OPTIONS] [FILES]
|
||||||
|
|
||||||
On archive creation or appending, tarlz removes leading and trailing
|
On archive creation or appending, tarlz removes leading and trailing
|
||||||
slashes from file names, as well as file name prefixes containing a
|
slashes from filenames, as well as filename prefixes containing a '..'
|
||||||
'..' component. On extraction, archive members containing a '..'
|
component. On extraction, archive members containing a '..' component
|
||||||
component are skipped.
|
are skipped. Tarlz detects when the archive being created or enlarged
|
||||||
|
is among the files to be dumped, appended or concatenated, and skips it.
|
||||||
|
|
||||||
|
On extraction and listing, tarlz removes leading './' strings from
|
||||||
|
member names in the archive or given in the command line, so that
|
||||||
|
'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
|
||||||
|
archive 'foo'.
|
||||||
|
|
||||||
tarlz supports the following options:
|
tarlz supports the following options:
|
||||||
|
|
||||||
|
@ -110,10 +108,22 @@ component are skipped.
|
||||||
'-V'
|
'-V'
|
||||||
'--version'
|
'--version'
|
||||||
Print the version number of tarlz on the standard output and exit.
|
Print the version number of tarlz on the standard output and exit.
|
||||||
|
This version number should be included in all bug reports.
|
||||||
|
|
||||||
|
'-A'
|
||||||
|
'--concatenate'
|
||||||
|
Append tar.lz archives to the end of a tar.lz archive. All the
|
||||||
|
archives involved must be regular (seekable) files compressed as
|
||||||
|
multimember lzip files, and the two end-of-file blocks plus any
|
||||||
|
zero padding must be contained in the last lzip member of each
|
||||||
|
archive. The intermediate end-of-file blocks are removed as each
|
||||||
|
new archive is concatenated. Exit with status 0 without modifying
|
||||||
|
the archive if no FILES have been specified. Tarlz can't
|
||||||
|
concatenate uncompressed tar archives.
|
||||||
|
|
||||||
'-c'
|
'-c'
|
||||||
'--create'
|
'--create'
|
||||||
Create a new archive.
|
Create a new archive from FILES.
|
||||||
|
|
||||||
'-C DIR'
|
'-C DIR'
|
||||||
'--directory=DIR'
|
'--directory=DIR'
|
||||||
|
@ -137,18 +147,19 @@ component are skipped.
|
||||||
|
|
||||||
'-r'
|
'-r'
|
||||||
'--append'
|
'--append'
|
||||||
Append files to the end of an archive. The archive must be a
|
Append files to the end of a tar.lz archive. The archive must be a
|
||||||
regular (seekable) file compressed as a multimember lzip file, and
|
regular (seekable) file compressed as a multimember lzip file, and
|
||||||
the two end-of-file blocks plus any zero padding must be contained
|
the two end-of-file blocks plus any zero padding must be contained
|
||||||
in the last lzip member of the archive. First this last member is
|
in the last lzip member of the archive. First this last member is
|
||||||
removed, then the new members are appended, and then a new
|
removed, then the new members are appended, and then a new
|
||||||
end-of-file member is appended to the archive. Exit with status 0
|
end-of-file member is appended to the archive. Exit with status 0
|
||||||
without modifying the archive if no FILES have been specified.
|
without modifying the archive if no FILES have been specified.
|
||||||
tarlz can't append files to an uncompressed tar archive.
|
Tarlz can't append files to an uncompressed tar archive.
|
||||||
|
|
||||||
'-t'
|
'-t'
|
||||||
'--list'
|
'--list'
|
||||||
List the contents of an archive.
|
List the contents of an archive. If FILES are given, list only the
|
||||||
|
given FILES.
|
||||||
|
|
||||||
'-v'
|
'-v'
|
||||||
'--verbose'
|
'--verbose'
|
||||||
|
@ -156,10 +167,14 @@ component are skipped.
|
||||||
|
|
||||||
'-x'
|
'-x'
|
||||||
'--extract'
|
'--extract'
|
||||||
Extract files from an archive.
|
Extract files from an archive. If FILES are given, extract only
|
||||||
|
the given FILES. Else extract all the files in the archive.
|
||||||
|
|
||||||
'-0 .. -9'
|
'-0 .. -9'
|
||||||
Set the compression level. The default compression level is '-6'.
|
Set the compression level. The default compression level is '-6'.
|
||||||
|
Like lzip, tarlz also minimizes the dictionary size of the lzip
|
||||||
|
members it creates, reducing the amount of memory required for
|
||||||
|
decompression.
|
||||||
|
|
||||||
'--asolid'
|
'--asolid'
|
||||||
When creating or appending to a compressed archive, use appendable
|
When creating or appending to a compressed archive, use appendable
|
||||||
|
@ -175,22 +190,49 @@ component are skipped.
|
||||||
creates a compressed appendable archive with a separate lzip
|
creates a compressed appendable archive with a separate lzip
|
||||||
member for each top-level directory.
|
member for each top-level directory.
|
||||||
|
|
||||||
|
'--no-solid'
|
||||||
|
When creating or appending to a compressed archive, compress each
|
||||||
|
file separately. The end-of-file blocks are compressed into a
|
||||||
|
separate lzip member. This creates a compressed appendable archive
|
||||||
|
with a separate lzip member for each file. This option allows
|
||||||
|
tarlz revert to default behavior if, for example, tarlz is invoked
|
||||||
|
through an alias like 'tar='tarlz --solid''.
|
||||||
|
|
||||||
'--solid'
|
'--solid'
|
||||||
When creating or appending to a compressed archive, use solid
|
When creating or appending to a compressed archive, use solid
|
||||||
compression. The files being added to the archive, along with the
|
compression. The files being added to the archive, along with the
|
||||||
end-of-file blocks, are compressed into a single lzip member. The
|
end-of-file blocks, are compressed into a single lzip member. The
|
||||||
resulting archive is not appendable. No more files can be later
|
resulting archive is not appendable. No more files can be later
|
||||||
appended to the archive without decompressing it first.
|
appended to the archive.
|
||||||
|
|
||||||
|
'--anonymous'
|
||||||
|
Equivalent to '--owner=root --group=root'.
|
||||||
|
|
||||||
|
'--owner=OWNER'
|
||||||
|
When creating or appending, use OWNER for files added to the
|
||||||
|
archive. If OWNER is not a valid user name, it is decoded as a
|
||||||
|
decimal numeric user ID.
|
||||||
|
|
||||||
'--group=GROUP'
|
'--group=GROUP'
|
||||||
When creating or appending, use GROUP for files added to the
|
When creating or appending, use GROUP for files added to the
|
||||||
archive. If GROUP is not a valid group name, it is decoded as a
|
archive. If GROUP is not a valid group name, it is decoded as a
|
||||||
decimal numeric group ID.
|
decimal numeric group ID.
|
||||||
|
|
||||||
'--owner=OWNER'
|
'--keep-damaged'
|
||||||
When creating or appending, use OWNER for files added to the
|
Don't delete partially extracted files. If a decompression error
|
||||||
archive. If OWNER is not a valid user name, it is decoded as a
|
happens while extracting a file, keep the partial data extracted.
|
||||||
decimal numeric user ID.
|
Use this option to recover as much data as possible from each
|
||||||
|
damaged member.
|
||||||
|
|
||||||
|
'--missing-crc'
|
||||||
|
Exit with error status 2 if the CRC of the extended records is
|
||||||
|
missing. When this option is used, tarlz detects any corruption
|
||||||
|
in the extended records (only limited by CRC collisions). But note
|
||||||
|
that a corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is
|
||||||
|
reported as a missing CRC instead of as a corrupt record. This
|
||||||
|
misleading 'Missing CRC' message is the consequence of a flaw in
|
||||||
|
the posix pax format; i.e., the lack of a mandatory check sequence
|
||||||
|
in the extended records. *Note crc32::.
|
||||||
|
|
||||||
'--uncompressed'
|
'--uncompressed'
|
||||||
With '--create', don't compress the created tar archive. Create an
|
With '--create', don't compress the created tar archive. Create an
|
||||||
|
@ -203,9 +245,337 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
||||||
caused tarlz to panic.
|
caused tarlz to panic.
|
||||||
|
|
||||||
|
|
||||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Invoking tarlz, Up: Top
|
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Invoking tarlz, Up: Top
|
||||||
|
|
||||||
3 A small tutorial with examples
|
3 File format
|
||||||
|
*************
|
||||||
|
|
||||||
|
In the diagram below, a box like this:
|
||||||
|
+---+
|
||||||
|
| | <-- the vertical bars might be missing
|
||||||
|
+---+
|
||||||
|
|
||||||
|
represents one byte; a box like this:
|
||||||
|
+==============+
|
||||||
|
| |
|
||||||
|
+==============+
|
||||||
|
|
||||||
|
represents a variable number of bytes or a fixed but large number of
|
||||||
|
bytes (for example 512).
|
||||||
|
|
||||||
|
|
||||||
|
A tar.lz file consists of a series of lzip members (compressed data
|
||||||
|
sets). The members simply appear one after another in the file, with no
|
||||||
|
additional information before, between, or after them.
|
||||||
|
|
||||||
|
Each lzip member contains one or more tar members in a simplified
|
||||||
|
posix pax interchange format; the only pax typeflag value supported by
|
||||||
|
tarlz (in addition to the typeflag values defined by the ustar format)
|
||||||
|
is 'x'. The pax format is an extension on top of the ustar format that
|
||||||
|
removes the size limitations of the ustar format.
|
||||||
|
|
||||||
|
Each tar member contains one file archived, and is represented by the
|
||||||
|
following sequence:
|
||||||
|
|
||||||
|
* An optional extended header block with extended header records.
|
||||||
|
This header block is of the form described in pax header block,
|
||||||
|
with a typeflag value of 'x'. The extended header records are
|
||||||
|
included as the data for this header block.
|
||||||
|
|
||||||
|
* A header block in ustar format that describes the file. Any fields
|
||||||
|
defined in the preceding optional extended header records override
|
||||||
|
the associated fields in this header block for this file.
|
||||||
|
|
||||||
|
* Zero or more blocks that contain the contents of the file.
|
||||||
|
|
||||||
|
At the end of the archive file there are two 512-byte blocks filled
|
||||||
|
with binary zeros, interpreted as an end-of-archive indicator. These EOF
|
||||||
|
blocks are either compressed in a separate lzip member or compressed
|
||||||
|
along with the tar members contained in the last lzip member.
|
||||||
|
|
||||||
|
The diagram below shows the correspondence between each tar member
|
||||||
|
(formed by one or two headers plus optional data) in the tar archive and
|
||||||
|
each lzip member in the resulting multimember tar.lz archive: *Note
|
||||||
|
File format: (lzip)File format.
|
||||||
|
|
||||||
|
tar
|
||||||
|
+========+======+=================+===============+========+======+========+
|
||||||
|
| header | data | extended header | extended data | header | data | EOF |
|
||||||
|
+========+======+=================+===============+========+======+========+
|
||||||
|
|
||||||
|
tar.lz
|
||||||
|
+===============+=================================================+========+
|
||||||
|
| member | member | member |
|
||||||
|
+===============+=================================================+========+
|
||||||
|
|
||||||
|
|
||||||
|
3.1 Pax header block
|
||||||
|
====================
|
||||||
|
|
||||||
|
The pax header block is identical to the ustar header block described
|
||||||
|
below except that the typeflag has the value 'x' (extended). The size
|
||||||
|
field is the size of the extended header data in bytes. Most other
|
||||||
|
fields in the pax header block are zeroed on archive creation to
|
||||||
|
prevent trouble if the archive is read by an ustar tool, and are
|
||||||
|
ignored by tarlz on archive extraction. *Note flawed-compat::.
|
||||||
|
|
||||||
|
The pax extended header data consists of one or more records, each of
|
||||||
|
them constructed as follows:
|
||||||
|
'"%d %s=%s\n", <length>, <keyword>, <value>'
|
||||||
|
|
||||||
|
The <length>, <blank>, <keyword>, <equals-sign>, and <newline> in the
|
||||||
|
record must be limited to the portable character set. The <length> field
|
||||||
|
contains the decimal length of the record in bytes, including the
|
||||||
|
trailing <newline>. The <value> field is stored as-is, without
|
||||||
|
conversion to UTF-8 nor any other transformation.
|
||||||
|
|
||||||
|
These are the <keyword> fields currently supported by tarlz:
|
||||||
|
|
||||||
|
'linkpath'
|
||||||
|
The pathname of a link being created to another file, of any type,
|
||||||
|
previously archived. This record overrides the linkname field in
|
||||||
|
the following ustar header block. The following ustar header block
|
||||||
|
determines the type of link created. If typeflag of the following
|
||||||
|
header block is 1, it will be a hard link. If typeflag is 2, it
|
||||||
|
will be a symbolic link and the linkpath value will be used as the
|
||||||
|
contents of the symbolic link.
|
||||||
|
|
||||||
|
'path'
|
||||||
|
The pathname of the following file. This record overrides the name
|
||||||
|
and prefix fields in the following ustar header block.
|
||||||
|
|
||||||
|
'size'
|
||||||
|
The size of the file in bytes, expressed as a decimal number using
|
||||||
|
digits from the ISO/IEC 646:1991 (ASCII) standard. This record
|
||||||
|
overrides the size field in the following ustar header block. The
|
||||||
|
size record is used only for files with a size value greater than
|
||||||
|
8_589_934_591 (octal 77777777777). This is 2^33 bytes or larger.
|
||||||
|
|
||||||
|
'GNU.crc32'
|
||||||
|
CRC32-C (Castagnoli) of the extended header data excluding the 8
|
||||||
|
bytes representing the CRC <value> itself. The <value> is
|
||||||
|
represented as 8 hexadecimal digits in big endian order,
|
||||||
|
'22 GNU.crc32=00000000\n'. The keyword of the CRC record is
|
||||||
|
protected by the CRC to guarante that corruption is always detected
|
||||||
|
(except in case of CRC collision). A CRC was chosen because a
|
||||||
|
checksum is too weak for a potentially large list of variable
|
||||||
|
sized records. A checksum can't detect simple errors like the
|
||||||
|
swapping of two bytes.
|
||||||
|
|
||||||
|
|
||||||
|
3.2 Ustar header block
|
||||||
|
======================
|
||||||
|
|
||||||
|
The ustar header block has a length of 512 bytes and is structured as
|
||||||
|
shown in the following table. All lengths and offsets are in decimal.
|
||||||
|
|
||||||
|
Field Name Offset Length (in bytes)
|
||||||
|
name 0 100
|
||||||
|
mode 100 8
|
||||||
|
uid 108 8
|
||||||
|
gid 116 8
|
||||||
|
size 124 12
|
||||||
|
mtime 136 12
|
||||||
|
chksum 148 8
|
||||||
|
typeflag 156 1
|
||||||
|
linkname 157 100
|
||||||
|
magic 257 6
|
||||||
|
version 263 2
|
||||||
|
uname 265 32
|
||||||
|
gname 297 32
|
||||||
|
devmajor 329 8
|
||||||
|
devminor 337 8
|
||||||
|
prefix 345 155
|
||||||
|
|
||||||
|
All characters in the header block are coded using the ISO/IEC
|
||||||
|
646:1991 (ASCII) standard, except in fields storing names for files,
|
||||||
|
users, and groups. For maximum portability between implementations,
|
||||||
|
names should only contain characters from the portable filename
|
||||||
|
character set. But if an implementation supports the use of characters
|
||||||
|
outside of '/' and the portable filename character set in names for
|
||||||
|
files, users, and groups, tarlz will use the byte values in these names
|
||||||
|
unmodified.
|
||||||
|
|
||||||
|
The fields name, linkname, and prefix are null-terminated character
|
||||||
|
strings except when all characters in the array contain non-null
|
||||||
|
characters including the last character.
|
||||||
|
|
||||||
|
The name and the prefix fields produce the pathname of the file. A
|
||||||
|
new pathname is formed, if prefix is not an empty string (its first
|
||||||
|
character is not null), by concatenating prefix (up to the first null
|
||||||
|
character), a <slash> character, and name; otherwise, name is used
|
||||||
|
alone. In either case, name is terminated at the first null character.
|
||||||
|
If prefix begins with a null character, it is ignored. In this manner,
|
||||||
|
pathnames of at most 256 characters can be supported. If a pathname does
|
||||||
|
not fit in the space provided, an extended record is used to store the
|
||||||
|
pathname.
|
||||||
|
|
||||||
|
The linkname field does not use the prefix to produce a pathname. If
|
||||||
|
the linkname does not fit in the 100 characters provided, an extended
|
||||||
|
record is used to store the linkname.
|
||||||
|
|
||||||
|
The mode field provides 12 access permission bits. The following
|
||||||
|
table shows the symbolic name of each bit and its octal value:
|
||||||
|
|
||||||
|
Bit Name Bit value
|
||||||
|
S_ISUID 04000
|
||||||
|
S_ISGID 02000
|
||||||
|
S_ISVTX 01000
|
||||||
|
S_IRUSR 00400
|
||||||
|
S_IWUSR 00200
|
||||||
|
S_IXUSR 00100
|
||||||
|
S_IRGRP 00040
|
||||||
|
S_IWGRP 00020
|
||||||
|
S_IXGRP 00010
|
||||||
|
S_IROTH 00004
|
||||||
|
S_IWOTH 00002
|
||||||
|
S_IXOTH 00001
|
||||||
|
|
||||||
|
The uid and gid fields are the user and group ID of the owner and
|
||||||
|
group of the file, respectively.
|
||||||
|
|
||||||
|
The size field contains the octal representation of the size of the
|
||||||
|
file in bytes. If the typeflag field specifies a file of type '0'
|
||||||
|
(regular file) or '7' (high performance regular file), the number of
|
||||||
|
logical records following the header is (size / 512) rounded to the next
|
||||||
|
integer. For all other values of typeflag, tarlz either sets the size
|
||||||
|
field to 0 or ignores it, and does not store or expect any logical
|
||||||
|
records following the header. If the file size is larger than
|
||||||
|
8_589_934_591 bytes (octal 77777777777), an extended record is used to
|
||||||
|
store the file size.
|
||||||
|
|
||||||
|
The mtime field contains the octal representation of the modification
|
||||||
|
time of the file at the time it was archived, obtained from the stat()
|
||||||
|
function.
|
||||||
|
|
||||||
|
The chksum field contains the octal representation of the value of
|
||||||
|
the simple sum of all bytes in the header logical record. Each byte in
|
||||||
|
the header is treated as an unsigned value. When calculating the
|
||||||
|
checksum, the chksum field is treated as if it were all <space>
|
||||||
|
characters.
|
||||||
|
|
||||||
|
The typeflag field contains a single character specifying the type of
|
||||||
|
file archived:
|
||||||
|
|
||||||
|
''0''
|
||||||
|
Regular file.
|
||||||
|
|
||||||
|
''1''
|
||||||
|
Hard link to another file, of any type, previously archived.
|
||||||
|
|
||||||
|
''2''
|
||||||
|
Symbolic link.
|
||||||
|
|
||||||
|
''3', '4''
|
||||||
|
Character special file and block special file respectively. In
|
||||||
|
this case the devmajor and devminor fields contain information
|
||||||
|
defining the device in unspecified format.
|
||||||
|
|
||||||
|
''5''
|
||||||
|
Directory.
|
||||||
|
|
||||||
|
''6''
|
||||||
|
FIFO special file.
|
||||||
|
|
||||||
|
''7''
|
||||||
|
Reserved to represent a file to which an implementation has
|
||||||
|
associated some high-performance attribute. Tarlz treats this type
|
||||||
|
of file as a regular file (type 0).
|
||||||
|
|
||||||
|
|
||||||
|
The magic field contains the ASCII null-terminated string "ustar".
|
||||||
|
The version field contains the characters "00" (0x30,0x30). The fields
|
||||||
|
uname, and gname are null-terminated character strings. Each numeric
|
||||||
|
field contains a leading zero-filled, null-terminated octal number using
|
||||||
|
digits from the ISO/IEC 646:1991 (ASCII) standard.
|
||||||
|
|
||||||
|
|
||||||
|
File: tarlz.info, Node: Amendments to pax format, Next: Examples, Prev: File format, Up: Top
|
||||||
|
|
||||||
|
4 The reasons for the differences with pax
|
||||||
|
******************************************
|
||||||
|
|
||||||
|
Tarlz is meant to reliably detect invalid or corrupt metadata during
|
||||||
|
extraction and to not create safety risks in the archives it creates. In
|
||||||
|
order to achieve these goals, tarlz makes some changes to the variant
|
||||||
|
of the pax format that it uses. This chapter describes these changes
|
||||||
|
and the concrete reasons to implement them.
|
||||||
|
|
||||||
|
|
||||||
|
4.1 Add a CRC of the extended records
|
||||||
|
=====================================
|
||||||
|
|
||||||
|
The posix pax format has a serious flaw. The metadata stored in pax
|
||||||
|
extended records are not protected by any kind of check sequence.
|
||||||
|
Corruption in a long filename may cause the extraction of the file in
|
||||||
|
the wrong place without warning. Corruption in a long file size may
|
||||||
|
cause the truncation of the file or the appending of garbage to the
|
||||||
|
file, both followed by a spurious warning about a corrupt header far
|
||||||
|
from the place of the undetected corruption.
|
||||||
|
|
||||||
|
Metadata like filename and file size must be always protected in an
|
||||||
|
archive format because of the adverse effects of undetected corruption
|
||||||
|
in them, potentially much worse that undetected corruption in the data.
|
||||||
|
Even more so in the case of pax because the amount of metadata it
|
||||||
|
stores is potentially large, making undetected corruption more probable.
|
||||||
|
|
||||||
|
Because of the above, tarlz protects the extended records with a CRC
|
||||||
|
in a way compatible with standard tar tools. *Note key_crc32::.
|
||||||
|
|
||||||
|
|
||||||
|
4.2 Remove flawed backward compatibility
|
||||||
|
========================================
|
||||||
|
|
||||||
|
In order to allow the extraction of pax archives by a tar utility
|
||||||
|
conforming to the POSIX-2:1993 standard, POSIX.1-2008 recommends
|
||||||
|
selecting extended header field values that allow such tar to create a
|
||||||
|
regular file containing the extended header records as data. This
|
||||||
|
approach is broken because if the extended header is needed because of
|
||||||
|
a long filename, the name and prefix fields will be unable to contain
|
||||||
|
the full pathname of the file. Therefore the files corresponding to
|
||||||
|
both the extended header and the overridden ustar header will be
|
||||||
|
extracted using truncated filenames, perhaps overwriting existing files
|
||||||
|
or directories. It may be a security risk to extract a file with a
|
||||||
|
truncated filename.
|
||||||
|
|
||||||
|
To avoid this problem, tarlz writes extended headers with all fields
|
||||||
|
zeroed except size, chksum, typeflag, magic and version. This prevents
|
||||||
|
old tar programs from extracting the extended records as a file in the
|
||||||
|
wrong place. Tarlz also sets to zero those fields of the ustar header
|
||||||
|
overridden by extended records.
|
||||||
|
|
||||||
|
If the extended header is needed because of a file size larger than
|
||||||
|
8 GiB, the size field will be unable to contain the full size of the
|
||||||
|
file. Therefore the file may be partially extracted, and the tool will
|
||||||
|
issue a spurious warning about a corrupt header at the point where it
|
||||||
|
thinks the file ends. Setting to zero the overridden size in the ustar
|
||||||
|
header at least prevents the partial extraction and makes obvious that
|
||||||
|
the file has been truncated.
|
||||||
|
|
||||||
|
|
||||||
|
4.3 As simple as possible (but not simpler)
|
||||||
|
===========================================
|
||||||
|
|
||||||
|
The tarlz format is mainly ustar. Extended pax headers are used only
|
||||||
|
when needed because the length of a filename or link name, or the size
|
||||||
|
of a file exceed the limits of the ustar format. Adding extended
|
||||||
|
headers to each member just to record subsecond timestamps seems
|
||||||
|
wasteful for a backup format.
|
||||||
|
|
||||||
|
|
||||||
|
4.4 Avoid misconversions to/from UTF-8
|
||||||
|
======================================
|
||||||
|
|
||||||
|
There is no portable way to tell what charset a text string is coded
|
||||||
|
into. Therefore, tarlz stores all fields representing text strings
|
||||||
|
as-is, without conversion to UTF-8 nor any other transformation. This
|
||||||
|
prevents accidental double UTF-8 conversions. If the need arises this
|
||||||
|
behavior will be adjusted with a command line option in the future.
|
||||||
|
|
||||||
|
|
||||||
|
File: tarlz.info, Node: Examples, Next: Problems, Prev: Amendments to pax format, Up: Top
|
||||||
|
|
||||||
|
5 A small tutorial with examples
|
||||||
********************************
|
********************************
|
||||||
|
|
||||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||||
|
@ -232,7 +602,7 @@ Example 4: Create a compressed appendable archive containing directories
|
||||||
'dir1', 'dir2' and 'dir3' with a separate lzip member per directory.
|
'dir1', 'dir2' and 'dir3' with a separate lzip member per directory.
|
||||||
Then append files 'a', 'b', 'c', 'd' and 'e' to the archive, all of
|
Then append files 'a', 'b', 'c', 'd' and 'e' to the archive, all of
|
||||||
them contained in a single lzip member. The resulting archive
|
them contained in a single lzip member. The resulting archive
|
||||||
'archive.tar.lz' contains 5 lzip members (including the eof member).
|
'archive.tar.lz' contains 5 lzip members (including the EOF member).
|
||||||
|
|
||||||
tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
|
tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
|
||||||
tarlz --asolid -rf archive.tar.lz a b c d e
|
tarlz --asolid -rf archive.tar.lz a b c d e
|
||||||
|
@ -240,7 +610,7 @@ them contained in a single lzip member. The resulting archive
|
||||||
|
|
||||||
Example 5: Create a solidly compressed archive 'archive.tar.lz'
|
Example 5: Create a solidly compressed archive 'archive.tar.lz'
|
||||||
containing files 'a', 'b' and 'c'. Note that no more files can be later
|
containing files 'a', 'b' and 'c'. Note that no more files can be later
|
||||||
appended to the archive without decompressing it first.
|
appended to the archive.
|
||||||
|
|
||||||
tarlz --solid -cf archive.tar.lz a b c
|
tarlz --solid -cf archive.tar.lz a b c
|
||||||
|
|
||||||
|
@ -263,7 +633,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
|
||||||
|
|
||||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||||
|
|
||||||
4 Reporting bugs
|
6 Reporting bugs
|
||||||
****************
|
****************
|
||||||
|
|
||||||
There are probably bugs in tarlz. There are certainly errors and
|
There are probably bugs in tarlz. There are certainly errors and
|
||||||
|
@ -284,8 +654,11 @@ Concept index
|
||||||
|