Merging upstream version 0.8.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
c7152715b0
commit
d91c44b5bd
28 changed files with 2668 additions and 574 deletions
36
ChangeLog
36
ChangeLog
|
@ -1,3 +1,39 @@
|
|||
2018-12-16 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 0.8 released.
|
||||
* Added new option '--anonymous' (--owner=root --group=root).
|
||||
* extract.cc (decode): 'tarlz -xf foo ./bar' now extracts 'bar'.
|
||||
* create.cc: Set to zero most fields in extended headers.
|
||||
* tarlz.texi: Added new chapter 'Amendments to pax format'.
|
||||
|
||||
2018-11-23 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 0.7 released.
|
||||
* Added new option '--keep-damaged'.
|
||||
* Added new option '--no-solid'.
|
||||
* create.cc (archive_write): Minimize dictionary size.
|
||||
* create.cc: Detect and skip archive in '-A', '-c' and '-r'.
|
||||
* main.cc (show_version): Show the version of lzlib being used.
|
||||
|
||||
2018-10-19 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 0.6 released.
|
||||
* Added new option '-A, --concatenate'.
|
||||
* Option '--ignore-crc' replaced with '--missing-crc'.
|
||||
* create.cc (add_member): Test that uid, gid, mtime, devmajor
|
||||
and devminor are in ustar range.
|
||||
* configure: Accept appending to CXXFLAGS, 'CXXFLAGS+=OPTIONS'.
|
||||
* Makefile.in: Use tarlz in target 'dist'.
|
||||
|
||||
2018-09-29 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 0.5 released.
|
||||
* Implemented simplified posix pax format.
|
||||
* Implemented CRC32-C (Castagnoli) of the extended header data.
|
||||
* Added new option '--ignore-crc'.
|
||||
* Added missing #includes for major, minor and makedev.
|
||||
* tarlz.texi: Documented the new archive format.
|
||||
|
||||
2018-04-23 Antonio Diaz Diaz <antonio@gnu.org>
|
||||
|
||||
* Version 0.4 released.
|
||||
|
|
7
INSTALL
7
INSTALL
|
@ -3,6 +3,9 @@ Requirements
|
|||
You will need a C++ compiler and the lzlib compression library installed.
|
||||
I use gcc 5.3.0 and 4.1.2, but the code should compile with any
|
||||
standards compliant compiler.
|
||||
Lzlib must be version 1.0 or newer, but --keep-damaged requires lzlib
|
||||
1.11-rc2 or newer to recover as much data as possible from each damaged
|
||||
member.
|
||||
Gcc is available at http://gcc.gnu.org.
|
||||
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
|
||||
|
||||
|
@ -24,6 +27,10 @@ the main archive.
|
|||
cd tarlz[version]
|
||||
./configure
|
||||
|
||||
To link against a lzlib not installed in a standard place, use:
|
||||
|
||||
./configure CPPFLAGS='-I<dir_of_lzlib.h>' LDFLAGS='-L<dir_of_liblz.a>'
|
||||
|
||||
3. Run make.
|
||||
|
||||
make
|
||||
|
|
13
Makefile.in
13
Makefile.in
|
@ -101,7 +101,7 @@ uninstall-man :
|
|||
|
||||
dist : doc
|
||||
ln -sf $(VPATH) $(DISTNAME)
|
||||
tar -Hustar --owner=root --group=root -cvf $(DISTNAME).tar \
|
||||
tarlz --solid --owner=root --group=root -9cvf $(DISTNAME).tar.lz \
|
||||
$(DISTNAME)/AUTHORS \
|
||||
$(DISTNAME)/COPYING \
|
||||
$(DISTNAME)/ChangeLog \
|
||||
|
@ -118,17 +118,24 @@ dist : doc
|
|||
$(DISTNAME)/testsuite/check.sh \
|
||||
$(DISTNAME)/testsuite/test.txt \
|
||||
$(DISTNAME)/testsuite/test.txt.tar \
|
||||
$(DISTNAME)/testsuite/test_bad1.txt.tar \
|
||||
$(DISTNAME)/testsuite/test_bad[12].txt \
|
||||
$(DISTNAME)/testsuite/t155.tar \
|
||||
$(DISTNAME)/testsuite/test3.tar \
|
||||
$(DISTNAME)/testsuite/test3_bad[1-5].tar \
|
||||
$(DISTNAME)/testsuite/test.txt.lz \
|
||||
$(DISTNAME)/testsuite/test.txt.tar.lz \
|
||||
$(DISTNAME)/testsuite/test_bad[12].txt.tar.lz \
|
||||
$(DISTNAME)/testsuite/test3.tar.lz \
|
||||
$(DISTNAME)/testsuite/test3a.tar.lz \
|
||||
$(DISTNAME)/testsuite/tlz_in_tar[12].tar \
|
||||
$(DISTNAME)/testsuite/test3_dir.tar.lz \
|
||||
$(DISTNAME)/testsuite/test3_dot.tar.lz \
|
||||
$(DISTNAME)/testsuite/t155.tar.lz \
|
||||
$(DISTNAME)/testsuite/test3_bad[1-6].tar.lz \
|
||||
$(DISTNAME)/testsuite/dotdot[1-5].tar.lz \
|
||||
$(DISTNAME)/testsuite/ug32chars.tar.lz \
|
||||
$(DISTNAME)/testsuite/eof.tar.lz
|
||||
rm -f $(DISTNAME)
|
||||
lzip -v -9 $(DISTNAME).tar
|
||||
|
||||
clean :
|
||||
-rm -f $(progname) $(objs)
|
||||
|
|
19
NEWS
19
NEWS
|
@ -1,5 +1,18 @@
|
|||
Changes in version 0.4:
|
||||
Changes in version 0.8:
|
||||
|
||||
Some missing #includes have been fixed.
|
||||
The new option '--anonymous', equivalent to '--owner=root --group=root', has
|
||||
been added.
|
||||
|
||||
Open files in binary mode on OS2.
|
||||
On extraction and listing, tarlz now removes leading './' strings also from
|
||||
member names given in the command line. 'tarlz -xf foo ./bar' now extracts
|
||||
member 'bar' from archive 'foo'. (Reported by Viktor Sergiienko in the
|
||||
bug-tar mailing list).
|
||||
|
||||
Tarlz now writes extended headers with all fields zeroed except size,
|
||||
chksum, typeflag, magic and version. This prevents old tar programs from
|
||||
extracting the extended records as a file in the wrong place (with a
|
||||
truncated filename). Tarlz now also sets to zero those fields of the ustar
|
||||
header overridden by extended records.
|
||||
|
||||
The chapter 'Amendments to pax format', explaining the reasons for the
|
||||
differences with the pax format, has been added.
|
||||
|
|
77
README
77
README
|
@ -1,36 +1,16 @@
|
|||
Description
|
||||
|
||||
Tarlz is a small and simple implementation of the tar archiver. By
|
||||
default tarlz creates, lists and extracts archives in the 'ustar' format
|
||||
compressed with lzip on a per file basis. Tarlz can append files to the
|
||||
end of such compressed archives.
|
||||
|
||||
Each tar member is compressed in its own lzip member, as well as the
|
||||
end-of-file blocks. This same method works for any tar format (gnu,
|
||||
ustar, posix) and is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
||||
any other tar.lz archive.
|
||||
Tarlz is a small and simple implementation of the tar archiver. By default
|
||||
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||
its own lzip member, as well as the end-of-file blocks. This method is fully
|
||||
backward compatible with standard tar tools like GNU tar, which treat the
|
||||
resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
|
||||
can append files to the end of such compressed archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression
|
||||
granularity; per file, per directory, appendable solid, and solid.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real tar
|
||||
programs to evaluate the format and perhaps implement it in their tools.
|
||||
|
||||
The diagram below shows the correspondence between tar members (formed
|
||||
by a header plus optional data) in the tar archive and lzip members in
|
||||
the resulting multimember tar.lz archive:
|
||||
|
||||
tar
|
||||
+========+======+========+======+========+======+========+
|
||||
| header | data | header | data | header | data | eof |
|
||||
+========+======+========+======+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+===============+===============+========+
|
||||
| member | member | member | member |
|
||||
+===============+===============+===============+========+
|
||||
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
less efficient than compressing the whole tar archive, but it has the
|
||||
following advantages:
|
||||
|
@ -38,19 +18,56 @@ following advantages:
|
|||
* The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel with plzip, multiplying the decompression speed.
|
||||
|
||||
* New members can be appended to the archive (by removing the eof
|
||||
* New members can be appended to the archive (by removing the EOF
|
||||
member) just like to an uncompressed tar archive.
|
||||
|
||||
* It is a safe posix-style backup format. In case of corruption,
|
||||
tarlz can extract all the undamaged members from the tar.lz
|
||||
archive, skipping over the damaged members, just like the standard
|
||||
(uncompressed) tar. Moreover, lziprecover can be used to recover at
|
||||
least part of the contents of the damaged members.
|
||||
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||
used to recover as much data as possible from each damaged member,
|
||||
and lziprecover can be used to recover some of the damaged members.
|
||||
|
||||
* A multimember tar.lz archive is usually smaller than the
|
||||
corresponding solidly compressed tar.gz archive, except when
|
||||
individually compressing files smaller than about 32 KiB.
|
||||
|
||||
Note that the posix pax format has a serious flaw. The metadata stored
|
||||
in pax extended records are not protected by any kind of check sequence.
|
||||
Corruption in a long filename may cause the extraction of the file in the
|
||||
wrong place without warning. Corruption in a long file size may cause the
|
||||
truncation of the file or the appending of garbage to the file, both
|
||||
followed by a spurious warning about a corrupt header far from the place
|
||||
of the undetected corruption.
|
||||
|
||||
Metadata like filename and file size must be always protected in an archive
|
||||
format because of the adverse effects of undetected corruption in them,
|
||||
potentially much worse that undetected corruption in the data. Even more so
|
||||
in the case of pax because the amount of metadata it stores is potentially
|
||||
large, making undetected corruption more probable.
|
||||
|
||||
Because of the above, tarlz protects the extended records with a CRC in
|
||||
a way compatible with standard tar tools.
|
||||
|
||||
Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real tar
|
||||
programs to evaluate the format and perhaps implement it in their tools.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
each lzip member in the resulting multimember tar.lz archive:
|
||||
|
||||
tar
|
||||
+========+======+=================+===============+========+======+========+
|
||||
| header | data | extended header | extended data | header | data | EOF |
|
||||
+========+======+=================+===============+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+=================================================+========+
|
||||
| member | member | member |
|
||||
+===============+=================================================+========+
|
||||
|
||||
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
|
||||
|
|
12
configure
vendored
12
configure
vendored
|
@ -6,7 +6,7 @@
|
|||
# to copy, distribute and modify it.
|
||||
|
||||
pkgname=tarlz
|
||||
pkgversion=0.4
|
||||
pkgversion=0.8
|
||||
progname=tarlz
|
||||
srctrigger=doc/${pkgname}.texi
|
||||
|
||||
|
@ -70,6 +70,7 @@ while [ $# != 0 ] ; do
|
|||
echo " CXX=COMPILER C++ compiler to use [${CXX}]"
|
||||
echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]"
|
||||
echo " CXXFLAGS=OPTIONS command line options for the C++ compiler [${CXXFLAGS}]"
|
||||
echo " CXXFLAGS+=OPTIONS append options to the current value of CXXFLAGS"
|
||||
echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]"
|
||||
echo
|
||||
exit 0 ;;
|
||||
|
@ -93,10 +94,11 @@ while [ $# != 0 ] ; do
|
|||
--mandir=*) mandir=${optarg} ;;
|
||||
--no-create) no_create=yes ;;
|
||||
|
||||
CXX=*) CXX=${optarg} ;;
|
||||
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
|
||||
CXXFLAGS=*) CXXFLAGS=${optarg} ;;
|
||||
LDFLAGS=*) LDFLAGS=${optarg} ;;
|
||||
CXX=*) CXX=${optarg} ;;
|
||||
CPPFLAGS=*) CPPFLAGS=${optarg} ;;
|
||||
CXXFLAGS=*) CXXFLAGS=${optarg} ;;
|
||||
CXXFLAGS+=*) CXXFLAGS="${CXXFLAGS} ${optarg}" ;;
|
||||
LDFLAGS=*) LDFLAGS=${optarg} ;;
|
||||
|
||||
--*)
|
||||
echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;;
|
||||
|
|
306
create.cc
306
create.cc
|
@ -28,6 +28,10 @@
|
|||
#include <stdint.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/types.h>
|
||||
#if defined(__GNU_LIBRARY__)
|
||||
#include <sys/sysmacros.h> // for major, minor
|
||||
#endif
|
||||
#include <ftw.h>
|
||||
#include <grp.h>
|
||||
#include <pwd.h>
|
||||
|
@ -37,6 +41,9 @@
|
|||
#include "lzip.h"
|
||||
#include "tarlz.h"
|
||||
|
||||
|
||||
const CRC32C crc32c;
|
||||
|
||||
int cl_owner = -1; // global vars needed by add_member
|
||||
int cl_group = -1;
|
||||
int cl_solid = 0; // 1 = dsolid, 2 = asolid, 3 = solid
|
||||
|
@ -44,6 +51,7 @@ int cl_solid = 0; // 1 = dsolid, 2 = asolid, 3 = solid
|
|||
namespace {
|
||||
|
||||
LZ_Encoder * encoder = 0; // local vars needed by add_member
|
||||
const char * archive_namep = 0;
|
||||
int outfd = -1;
|
||||
int gretval = 0;
|
||||
|
||||
|
@ -55,31 +63,67 @@ int seek_read( const int fd, uint8_t * const buf, const int size,
|
|||
return 0;
|
||||
}
|
||||
|
||||
// Check archive type, remove EOF blocks, and leave outfd file pos at EOF
|
||||
bool check_appendable()
|
||||
// infd and outfd can refer to the same file if copying to a lower file
|
||||
// position or if source and destination blocks don't overlap.
|
||||
// max_size < 0 means no size limit.
|
||||
bool copy_file( const int infd, const int outfd, const long long max_size = -1 )
|
||||
{
|
||||
const int buffer_size = 65536;
|
||||
// remaining number of bytes to copy
|
||||
long long rest = ( ( max_size >= 0 ) ? max_size : buffer_size );
|
||||
long long copied_size = 0;
|
||||
uint8_t * const buffer = new uint8_t[buffer_size];
|
||||
bool error = false;
|
||||
|
||||
while( rest > 0 )
|
||||
{
|
||||
const int size = std::min( (long long)buffer_size, rest );
|
||||
if( max_size >= 0 ) rest -= size;
|
||||
const int rd = readblock( infd, buffer, size );
|
||||
if( rd != size && errno )
|
||||
{ show_error( "Error reading input file", errno ); error = true; break; }
|
||||
if( rd > 0 )
|
||||
{
|
||||
const int wr = writeblock( outfd, buffer, rd );
|
||||
if( wr != rd )
|
||||
{ show_error( "Error writing output file", errno );
|
||||
error = true; break; }
|
||||
copied_size += rd;
|
||||
}
|
||||
if( rd < size ) break; // EOF
|
||||
}
|
||||
delete[] buffer;
|
||||
return ( !error && ( max_size < 0 || copied_size == max_size ) );
|
||||
}
|
||||
|
||||
|
||||
/* Check archive type. If success, leave fd file pos at 0.
|
||||
If remove_eof, leave fd file pos at beginning of the EOF blocks. */
|
||||
bool check_appendable( const int fd, const bool remove_eof )
|
||||
{
|
||||
struct stat st;
|
||||
if( fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) ) return false;
|
||||
uint8_t buf[header_size];
|
||||
int rd = readblock( outfd, buf, header_size );
|
||||
if( fstat( fd, &st ) != 0 || !S_ISREG( st.st_mode ) ) return false;
|
||||
if( lseek( fd, 0, SEEK_SET ) != 0 ) return false;
|
||||
enum { bufsize = header_size + ( header_size / 8 ) };
|
||||
uint8_t buf[bufsize];
|
||||
int rd = readblock( fd, buf, bufsize );
|
||||
if( rd == 0 && errno == 0 ) return true; // append to empty archive
|
||||
if( rd < min_member_size || ( rd != header_size && errno ) ) return false;
|
||||
const Lzip_header * const p = (Lzip_header *)buf; // shut up gcc
|
||||
if( rd < min_member_size || ( rd != bufsize && errno ) ) return false;
|
||||
const Lzip_header * const p = (const Lzip_header *)buf; // shut up gcc
|
||||
if( !p->verify_magic() ) return false;
|
||||
LZ_Decoder * decoder = LZ_decompress_open(); // decompress first header
|
||||
if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ||
|
||||
LZ_decompress_write( decoder, buf, rd ) != rd ||
|
||||
( rd = LZ_decompress_read( decoder, buf, header_size ) ) <
|
||||
magic_o + magic_l )
|
||||
( rd = LZ_decompress_read( decoder, buf, header_size ) ) != header_size )
|
||||
{ LZ_decompress_close( decoder ); return false; }
|
||||
LZ_decompress_close( decoder );
|
||||
const bool maybe_eof = ( buf[0] == 0 );
|
||||
if( !verify_ustar_chksum( buf ) && !maybe_eof ) return false;
|
||||
const long long end = lseek( outfd, 0, SEEK_END );
|
||||
const long long end = lseek( fd, 0, SEEK_END );
|
||||
if( end < min_member_size ) return false;
|
||||
|
||||
Lzip_trailer trailer;
|
||||
if( seek_read( outfd, trailer.data, Lzip_trailer::size,
|
||||
if( seek_read( fd, trailer.data, Lzip_trailer::size,
|
||||
end - Lzip_trailer::size ) != Lzip_trailer::size )
|
||||
return false;
|
||||
const long long member_size = trailer.member_size();
|
||||
|
@ -87,9 +131,8 @@ bool check_appendable()
|
|||
( maybe_eof && member_size != end ) ) return false;
|
||||
|
||||
Lzip_header header;
|
||||
if( seek_read( outfd, header.data, Lzip_header::size,
|
||||
end - member_size ) != Lzip_header::size )
|
||||
return false;
|
||||
if( seek_read( fd, header.data, Lzip_header::size,
|
||||
end - member_size ) != Lzip_header::size ) return false;
|
||||
if( !header.verify_magic() || !isvalid_ds( header.dictionary_size() ) )
|
||||
return false;
|
||||
|
||||
|
@ -102,12 +145,33 @@ bool check_appendable()
|
|||
crc ^= 0xFFFFFFFFU;
|
||||
if( crc != data_crc ) return false;
|
||||
|
||||
if( lseek( outfd, end - member_size, SEEK_SET ) != end - member_size ||
|
||||
ftruncate( outfd, end - member_size ) != 0 ) return false;
|
||||
return true;
|
||||
const long long pos = remove_eof ? end - member_size : 0;
|
||||
return ( lseek( fd, pos, SEEK_SET ) == pos );
|
||||
}
|
||||
|
||||
|
||||
class File_is_archive
|
||||
{
|
||||
dev_t archive_dev;
|
||||
ino_t archive_ino;
|
||||
bool initialized;
|
||||
public:
|
||||
File_is_archive() : initialized( false ) {}
|
||||
bool init()
|
||||
{
|
||||
struct stat st;
|
||||
if( fstat( outfd, &st ) != 0 ) return false;
|
||||
if( S_ISREG( st.st_mode ) )
|
||||
{ archive_dev = st.st_dev; archive_ino = st.st_ino; initialized = true; }
|
||||
return true;
|
||||
}
|
||||
bool operator()( const struct stat & st ) const
|
||||
{
|
||||
return initialized && archive_dev == st.st_dev && archive_ino == st.st_ino;
|
||||
}
|
||||
} file_is_archive;
|
||||
|
||||
|
||||
bool archive_write( const uint8_t * const buf, const int size )
|
||||
{
|
||||
if( !encoder ) // uncompressed
|
||||
|
@ -121,9 +185,10 @@ bool archive_write( const uint8_t * const buf, const int size )
|
|||
const int wr = LZ_compress_write( encoder, buf + sz, size - sz );
|
||||
if( wr < 0 ) internal_error( "library error (LZ_compress_write)." );
|
||||
sz += wr;
|
||||
if( sz >= size && size > 0 ) break; // minimize dictionary size
|
||||
const int rd = LZ_compress_read( encoder, obuf, obuf_size );
|
||||
if( rd < 0 ) internal_error( "library error (LZ_compress_read)." );
|
||||
if( rd == 0 && sz == size ) break;
|
||||
if( rd == 0 && sz >= size ) break;
|
||||
if( writeblock( outfd, obuf, rd ) != rd ) return false;
|
||||
}
|
||||
if( LZ_compress_finished( encoder ) == 1 &&
|
||||
|
@ -133,11 +198,98 @@ bool archive_write( const uint8_t * const buf, const int size )
|
|||
}
|
||||
|
||||
|
||||
void init_tar_header( Tar_header header ) // set magic and version
|
||||
{
|
||||
std::memset( header, 0, header_size );
|
||||
std::memcpy( header + magic_o, ustar_magic, magic_l - 1 );
|
||||
header[version_o] = header[version_o+1] = '0';
|
||||
}
|
||||
|
||||
|
||||
unsigned char xdigit( const unsigned value )
|
||||
{
|
||||
if( value <= 9 ) return '0' + value;
|
||||
if( value <= 15 ) return 'A' + value - 10;
|
||||
return 0;
|
||||
}
|
||||
|
||||
void print_hex( char * const buf, int size, unsigned long long num )
|
||||
{
|
||||
while( --size >= 0 ) { buf[size] = xdigit( num & 0x0F ); num >>= 4; }
|
||||
}
|
||||
|
||||
void print_octal( char * const buf, int size, unsigned long long num )
|
||||
{
|
||||
while( --size >= 0 ) { buf[size] = '0' + ( num % 8 ); num /= 8; }
|
||||
}
|
||||
|
||||
unsigned decimal_digits( unsigned long long value )
|
||||
{
|
||||
unsigned digits = 1;
|
||||
while( value >= 10 ) { value /= 10; ++digits; }
|
||||
return digits;
|
||||
}
|
||||
|
||||
unsigned long long record_size( const unsigned keyword_size,
|
||||
const unsigned long long value_size )
|
||||
{
|
||||
// size = ' ' + keyword + '=' + value + '\n'
|
||||
const unsigned long long size = 1 + keyword_size + 1 + value_size + 1;
|
||||
const unsigned d1 = decimal_digits( size );
|
||||
return decimal_digits( d1 + size ) + size;
|
||||
}
|
||||
|
||||
bool write_extended( const Extended & extended )
|
||||
{
|
||||
const int path_rec = extended.path.size() ?
|
||||
record_size( 4, extended.path.size() ) : 0;
|
||||
const int lpath_rec = extended.linkpath.size() ?
|
||||
record_size( 8, extended.linkpath.size() ) : 0;
|
||||
const int size_rec = ( extended.size > 0 ) ?
|
||||
record_size( 4, decimal_digits( extended.size ) ) : 0;
|
||||
const unsigned long long edsize = path_rec + lpath_rec + size_rec + 22;
|
||||
const unsigned long long bufsize = round_up( edsize );
|
||||
if( edsize >= 1ULL << 33 ) return false; // too much extended data
|
||||
if( bufsize == 0 ) return edsize == 0; // overflow or no extended data
|
||||
char * const buf = new char[bufsize+1]; // extended records buffer
|
||||
unsigned long long pos = path_rec; // goto can't cross this
|
||||
if( path_rec && snprintf( buf, path_rec + 1, "%d path=%s\n",
|
||||
path_rec, extended.path.c_str() ) != path_rec )
|
||||
goto error;
|
||||
if( lpath_rec && snprintf( buf + pos, lpath_rec + 1, "%d linkpath=%s\n",
|
||||
lpath_rec, extended.linkpath.c_str() ) != lpath_rec )
|
||||
goto error;
|
||||
pos += lpath_rec;
|
||||
if( size_rec && snprintf( buf + pos, size_rec + 1, "%d size=%llu\n",
|
||||
size_rec, extended.size ) != size_rec )
|
||||
goto error;
|
||||
pos += size_rec;
|
||||
if( snprintf( buf + pos, 23, "22 GNU.crc32=00000000\n" ) != 22 ) goto error;
|
||||
pos += 22;
|
||||
if( pos != edsize ) goto error;
|
||||
print_hex( buf + edsize - 9, 8,
|
||||
crc32c.windowed_crc( (const uint8_t *)buf, edsize - 9, edsize ) );
|
||||
std::memset( buf + edsize, 0, bufsize - edsize ); // wipe padding
|
||||
Tar_header header; // extended header
|
||||
init_tar_header( header );
|
||||
header[typeflag_o] = tf_extended; // fill only required fields
|
||||
print_octal( header + size_o, size_l - 1, edsize );
|
||||
print_octal( header + chksum_o, chksum_l - 1,
|
||||
ustar_chksum( (const uint8_t *)header ) );
|
||||
if( !archive_write( (const uint8_t *)header, header_size ) ) goto error;
|
||||
for( pos = 0; pos < bufsize; ) // write extended records to archive
|
||||
{
|
||||
int size = std::min( bufsize - pos, 1ULL << 20 );
|
||||
if( !archive_write( (const uint8_t *)buf + pos, size ) ) goto error;
|
||||
pos += size;
|
||||
}
|
||||
delete[] buf;
|
||||
return true;
|
||||
error:
|
||||
delete[] buf;
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
const char * remove_leading_dotdot( const char * const filename )
|
||||
{
|
||||
|
@ -164,24 +316,31 @@ const char * remove_leading_dotdot( const char * const filename )
|
|||
}
|
||||
|
||||
|
||||
bool split_name( const char * const filename, Tar_header header )
|
||||
// Return true if filename fits in the ustar header.
|
||||
bool store_name( const char * const filename, Extended & extended,
|
||||
Tar_header header )
|
||||
{
|
||||
const char * const stored_name = remove_leading_dotdot( filename );
|
||||
const int len = std::strlen( stored_name );
|
||||
enum { max_len = prefix_l + 1 + name_l }; // prefix + '/' + name
|
||||
|
||||
// first try storing filename in the ustar header
|
||||
if( len <= name_l ) // stored_name fits in name
|
||||
{ std::memcpy( header + name_o, stored_name, len ); return true; }
|
||||
if( len <= max_len ) // find shortest prefix
|
||||
for( int i = len - name_l - 1; i < len && i <= prefix_l; ++i )
|
||||
if( stored_name[i] == '/' )
|
||||
if( stored_name[i] == '/' ) // stored_name can be split
|
||||
{
|
||||
std::memcpy( header + name_o, stored_name + i + 1, len - i - 1 );
|
||||
std::memcpy( header + prefix_o, stored_name, i );
|
||||
return true;
|
||||
}
|
||||
// store filename in extended record, leave name zeroed in ustar header
|
||||
extended.path = stored_name;
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
int add_member( const char * const filename, const struct stat *,
|
||||
const int flag, struct FTW * )
|
||||
{
|
||||
|
@ -189,11 +348,13 @@ int add_member( const char * const filename, const struct stat *,
|
|||
if( lstat( filename, &st ) != 0 )
|
||||
{ show_file_error( filename, "Can't stat input file", errno );
|
||||
gretval = 1; return 0; }
|
||||
if( file_is_archive( st ) )
|
||||
{ show_file_error( archive_namep, "File is the archive; not dumped." );
|
||||
return 0; }
|
||||
Extended extended; // metadata for extended records
|
||||
Tar_header header;
|
||||
std::memset( header, 0, header_size );
|
||||
if( !split_name( filename, header ) )
|
||||
{ show_file_error( filename, "File name is too long." );
|
||||
gretval = 2; return 0; }
|
||||
init_tar_header( header );
|
||||
store_name( filename, extended, header );
|
||||
|
||||
const mode_t mode = st.st_mode;
|
||||
print_octal( header + mode_o, mode_l - 1,
|
||||
|
@ -201,10 +362,17 @@ int add_member( const char * const filename, const struct stat *,
|
|||
S_IRWXU | S_IRWXG | S_IRWXO ) );
|
||||
const uid_t uid = ( cl_owner >= 0 ) ? (uid_t)cl_owner : st.st_uid;
|
||||
const gid_t gid = ( cl_group >= 0 ) ? (gid_t)cl_group : st.st_gid;
|
||||
if( uid >= 2 << 20 || gid >= 2 << 20 )
|
||||
{ show_file_error( filename, "uid or gid is larger than 2_097_151." );
|
||||
gretval = 1; return 0; }
|
||||
print_octal( header + uid_o, uid_l - 1, uid );
|
||||
print_octal( header + gid_o, gid_l - 1, gid );
|
||||
const long long mtime = st.st_mtime; // shut up gcc
|
||||
if( mtime < 0 || mtime >= 1LL << 33 )
|
||||
{ show_file_error( filename, "mtime is out of ustar range [0, 8_589_934_591]." );
|
||||
gretval = 1; return 0; }
|
||||
print_octal( header + mtime_o, mtime_l - 1, mtime );
|
||||
unsigned long long file_size = 0;
|
||||
print_octal( header + mtime_o, mtime_l - 1, st.st_mtime );
|
||||
Typeflag typeflag;
|
||||
if( S_ISREG( mode ) ) { typeflag = tf_regular; file_size = st.st_size; }
|
||||
else if( S_ISDIR( mode ) )
|
||||
|
@ -217,16 +385,26 @@ int add_member( const char * const filename, const struct stat *,
|
|||
else if( S_ISLNK( mode ) )
|
||||
{
|
||||
typeflag = tf_symlink;
|
||||
if( st.st_size > linkname_l ||
|
||||
readlink( filename, header + linkname_o, linkname_l ) != st.st_size )
|
||||
long len;
|
||||
if( st.st_size <= linkname_l )
|
||||
len = readlink( filename, header + linkname_o, linkname_l );
|
||||
else
|
||||
{
|
||||
show_file_error( filename, "Link destination name is too long." );
|
||||
gretval = 2; return 0;
|
||||
char * const buf = new char[st.st_size+1];
|
||||
len = readlink( filename, buf, st.st_size );
|
||||
if( len == st.st_size ) { buf[len] = 0; extended.linkpath = buf; }
|
||||
delete[] buf;
|
||||
}
|
||||
if( len != st.st_size )
|
||||
{ show_file_error( filename, "Error reading link", (len < 0) ? errno : 0 );
|
||||
gretval = 1; return 0; }
|
||||
}
|
||||
else if( S_ISCHR( mode ) || S_ISBLK( mode ) )
|
||||
{
|
||||
typeflag = S_ISCHR( mode ) ? tf_chardev : tf_blockdev;
|
||||
if( major( st.st_dev ) >= 2 << 20 || minor( st.st_dev ) >= 2 << 20 )
|
||||
{ show_file_error( filename, "devmajor or devminor is larger than 2_097_151." );
|
||||
gretval = 1; return 0; }
|
||||
print_octal( header + devmajor_o, devmajor_l - 1, major( st.st_dev ) );
|
||||
print_octal( header + devminor_o, devminor_l - 1, minor( st.st_dev ) );
|
||||
}
|
||||
|
@ -234,22 +412,23 @@ int add_member( const char * const filename, const struct stat *,
|
|||
else { show_file_error( filename, "Unknown file type." );
|
||||
gretval = 2; return 0; }
|
||||
header[typeflag_o] = typeflag;
|
||||
std::memcpy( header + magic_o, ustar_magic, magic_l - 1 );
|
||||
header[version_o] = header[version_o+1] = '0';
|
||||
const struct passwd * const pw = getpwuid( uid );
|
||||
if( pw && pw->pw_name )
|
||||
std::strncpy( header + uname_o, pw->pw_name, uname_l - 1 );
|
||||
const struct group * const gr = getgrgid( gid );
|
||||
if( gr && gr->gr_name )
|
||||
std::strncpy( header + gname_o, gr->gr_name, gname_l - 1 );
|
||||
print_octal( header + size_o, size_l - 1, file_size );
|
||||
if( file_size >= 1ULL << 33 ) extended.size = file_size;
|
||||
else print_octal( header + size_o, size_l - 1, file_size );
|
||||
print_octal( header + chksum_o, chksum_l - 1,
|
||||
ustar_chksum( (const uint8_t *)header ) );
|
||||
|
||||
const int infd = file_size ? open_instream( filename ) : -1;
|
||||
if( file_size && infd < 0 ) { gretval = 1; return 0; }
|
||||
if( !extended.empty() && !write_extended( extended ) )
|
||||
{ show_error( "Error writing extended header", errno ); return 1; }
|
||||
if( !archive_write( (const uint8_t *)header, header_size ) )
|
||||
{ show_error( "Error writing archive header", errno ); return 1; }
|
||||
{ show_error( "Error writing ustar header", errno ); return 1; }
|
||||
if( file_size )
|
||||
{
|
||||
enum { bufsize = 32 * header_size };
|
||||
|
@ -304,6 +483,49 @@ bool verify_ustar_chksum( const uint8_t * const buf )
|
|||
ustar_chksum( buf ) == strtoul( (const char *)buf + chksum_o, 0, 8 ) ); }
|
||||
|
||||
|
||||
int concatenate( const std::string & archive_name, const Arg_parser & parser,
|
||||
const int filenames )
|
||||
{
|
||||
if( !filenames )
|
||||
{ if( verbosity >= 1 ) show_error( "Nothing to concatenate." ); return 0; }
|
||||
if( archive_name.empty() )
|
||||
{ show_error( "'--concatenate' is incompatible with '-f -'.", 0, true );
|
||||
return 1; }
|
||||
if( ( outfd = open_outstream( archive_name, false ) ) < 0 ) return 1;
|
||||
if( !file_is_archive.init() )
|
||||
{ show_file_error( archive_name.c_str(), "Can't stat", errno ); return 1; }
|
||||
|
||||
int retval = 0;
|
||||
for( int i = 0; i < parser.arguments(); ++i ) // copy archives
|
||||
{
|
||||
if( parser.code( i ) ) continue; // skip options
|
||||
const char * const filename = parser.argument( i ).c_str();
|
||||
const int infd = open_instream( filename );
|
||||
if( infd < 0 )
|
||||
{ show_file_error( filename, "Can't open input file", errno );
|
||||
retval = 1; break; }
|
||||
if( !check_appendable( infd, false ) )
|
||||
{ show_file_error( filename, "Not an appendable tar.lz archive." );
|
||||
close( infd ); retval = 2; break; }
|
||||
struct stat st;
|
||||
if( fstat( infd, &st ) == 0 && file_is_archive( st ) )
|
||||
{ show_file_error( filename, "File is the archive; not concatenated." );
|
||||
close( infd ); continue; }
|
||||
if( !check_appendable( outfd, true ) )
|
||||
{ show_error( "This does not look like an appendable tar.lz archive." );
|
||||
close( infd ); retval = 2; break; }
|
||||
if( !copy_file( infd, outfd ) || close( infd ) != 0 )
|
||||
{ show_file_error( filename, "Error copying archive", errno );
|
||||
retval = 1; break; }
|
||||
if( verbosity >= 1 ) std::fprintf( stderr, "%s\n", filename );
|
||||
}
|
||||
|
||||
if( close( outfd ) != 0 && !retval )
|
||||
{ show_error( "Error closing archive", errno ); retval = 1; }
|
||||
return retval;
|
||||
}
|
||||
|
||||
|
||||
int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||
const int filenames, const int level, const bool append )
|
||||
{
|
||||
|
@ -345,11 +567,15 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
|||
{ show_error( "'--append' is incompatible with '--uncompressed'.", 0, true );
|
||||
return 1; }
|
||||
if( ( outfd = open_outstream( archive_name, false ) ) < 0 ) return 1;
|
||||
if( !check_appendable() )
|
||||
if( !check_appendable( outfd, true ) )
|
||||
{ show_error( "This does not look like an appendable tar.lz archive." );
|
||||
return 2; }
|
||||
}
|
||||
|
||||
archive_namep = archive_name.size() ? archive_name.c_str() : "(stdout)";
|
||||
if( !file_is_archive.init() )
|
||||
{ show_file_error( archive_namep, "Can't stat", errno ); return 1; }
|
||||
|
||||
if( compressed )
|
||||
{
|
||||
encoder = LZ_compress_open( option_mapping[level].dictionary_size,
|
||||
|
@ -365,7 +591,6 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
|||
}
|
||||
|
||||
int retval = 0;
|
||||
std::string deslashed; // arg without trailing slashes
|
||||
for( int i = 0; i < parser.arguments(); ++i ) // write members
|
||||
{
|
||||
const int code = parser.code( i );
|
||||
|
@ -375,6 +600,7 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
|||
{ show_file_error( filename, "Error changing working directory", errno );
|
||||
retval = 1; break; }
|
||||
if( code ) continue; // skip options
|
||||
std::string deslashed; // arg without trailing slashes
|
||||
unsigned len = arg.size();
|
||||
while( len > 1 && arg[len-1] == '/' ) --len;
|
||||
if( len < arg.size() )
|
||||
|
@ -391,16 +617,18 @@ int encode( const std::string & archive_name, const Arg_parser & parser,
|
|||
|
||||
if( !retval ) // write End-Of-Archive records
|
||||
{
|
||||
uint8_t buf[header_size];
|
||||
std::memset( buf, 0, header_size );
|
||||
enum { bufsize = 2 * header_size };
|
||||
uint8_t buf[bufsize];
|
||||
std::memset( buf, 0, bufsize );
|
||||
if( encoder && cl_solid == 2 && !archive_write( 0, 0 ) ) // flush encoder
|
||||
{ show_error( "Error flushing encoder", errno ); retval = 1; }
|
||||
else if( !archive_write( buf, header_size ) ||
|
||||
!archive_write( buf, header_size ) ||
|
||||
else if( !archive_write( buf, bufsize ) ||
|
||||
( encoder && !archive_write( 0, 0 ) ) ) // flush encoder
|
||||
{ show_error( "Error writing end-of-archive blocks", errno );
|
||||
retval = 1; }
|
||||
}
|
||||
if( encoder && LZ_compress_close( encoder ) < 0 )
|
||||
{ show_error( "LZ_compress_close failed." ); retval = 1; }
|
||||
if( close( outfd ) != 0 && !retval )
|
||||
{ show_error( "Error closing archive", errno ); retval = 1; }
|
||||
if( retval && archive_name.size() && !append )
|
||||
|
|
41
doc/tarlz.1
41
doc/tarlz.1
|
@ -1,12 +1,25 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH TARLZ "1" "April 2018" "tarlz 0.4" "User Commands"
|
||||
.TH TARLZ "1" "December 2018" "tarlz 0.8" "User Commands"
|
||||
.SH NAME
|
||||
tarlz \- creates tar archives with multimember lzip compression
|
||||
.SH SYNOPSIS
|
||||
.B tarlz
|
||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Tarlz \- Archiver with multimember lzip compression.
|
||||
Tarlz is a small and simple implementation of the tar archiver. By default
|
||||
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||
its own lzip member, as well as the end\-of\-file blocks. This method is fully
|
||||
backward compatible with standard tar tools like GNU tar, which treat the
|
||||
resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
|
||||
can append files to the end of such compressed archives.
|
||||
.PP
|
||||
The tarlz file format is a safe posix\-style backup format. In case of
|
||||
corruption, tarlz can extract all the undamaged members from the tar.lz
|
||||
archive, skipping over the damaged members, just like the standard
|
||||
(uncompressed) tar. Moreover, the option '\-\-keep\-damaged' can be used to
|
||||
recover as much data as possible from each damaged member, and lziprecover
|
||||
can be used to recover some of the damaged members.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-h\fR, \fB\-\-help\fR
|
||||
|
@ -15,6 +28,9 @@ display this help and exit
|
|||
\fB\-V\fR, \fB\-\-version\fR
|
||||
output version information and exit
|
||||
.TP
|
||||
\fB\-A\fR, \fB\-\-concatenate\fR
|
||||
append tar.lz archives to the end of an archive
|
||||
.TP
|
||||
\fB\-c\fR, \fB\-\-create\fR
|
||||
create a new archive
|
||||
.TP
|
||||
|
@ -48,17 +64,29 @@ create solidly compressed appendable archive
|
|||
\fB\-\-dsolid\fR
|
||||
create per\-directory compressed archive
|
||||
.TP
|
||||
\fB\-\-no\-solid\fR
|
||||
create per\-file compressed archive (default)
|
||||
.TP
|
||||
\fB\-\-solid\fR
|
||||
create solidly compressed archive
|
||||
.TP
|
||||
\fB\-\-group=\fR<group>
|
||||
use <group> name/id for added files
|
||||
\fB\-\-anonymous\fR
|
||||
equivalent to '\-\-owner=root \fB\-\-group\fR=\fI\,root\/\fR'
|
||||
.TP
|
||||
\fB\-\-owner=\fR<owner>
|
||||
use <owner> name/id for added files
|
||||
use <owner> name/ID for files added
|
||||
.TP
|
||||
\fB\-\-group=\fR<group>
|
||||
use <group> name/ID for files added
|
||||
.TP
|
||||
\fB\-\-keep\-damaged\fR
|
||||
don't delete partially extracted files
|
||||
.TP
|
||||
\fB\-\-missing\-crc\fR
|
||||
exit with error status if missing extended CRC
|
||||
.TP
|
||||
\fB\-\-uncompressed\fR
|
||||
don't compress the created archive
|
||||
don't compress the archive created
|
||||
.PP
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
|
@ -70,6 +98,7 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||
Using lzlib 1.11\-rc2
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
498
doc/tarlz.info
498
doc/tarlz.info
|
@ -11,15 +11,17 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.4, 23 April 2018).
|
||||
This manual is for Tarlz (version 0.8, 16 December 2018).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command line interface
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command line interface
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
|
@ -34,38 +36,17 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
|||
**************
|
||||
|
||||
Tarlz is a small and simple implementation of the tar archiver. By
|
||||
default tarlz creates, lists and extracts archives in the 'ustar' format
|
||||
compressed with lzip on a per file basis. Tarlz can append files to the
|
||||
end of such compressed archives.
|
||||
|
||||
Each tar member is compressed in its own lzip member, as well as the
|
||||
end-of-file blocks. This same method works for any tar format (gnu,
|
||||
ustar, posix) and is fully backward compatible with standard tar tools
|
||||
default tarlz creates, lists and extracts archives in a simplified
|
||||
posix pax format compressed with lzip on a per file basis. Each tar
|
||||
member is compressed in its own lzip member, as well as the end-of-file
|
||||
blocks. This method is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
||||
any other tar.lz archive.
|
||||
any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression
|
||||
granularity; per file, per directory, appendable solid, and solid.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real
|
||||
tar programs to evaluate the format and perhaps implement it in their
|
||||
tools.
|
||||
|
||||
The diagram below shows the correspondence between tar members
|
||||
(formed by a header plus optional data) in the tar archive and lzip
|
||||
members in the resulting multimember tar.lz archive: *Note File format:
|
||||
(lzip)File format.
|
||||
|
||||
tar
|
||||
+========+======+========+======+========+======+========+
|
||||
| header | data | header | data | header | data | eof |
|
||||
+========+======+========+======+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+===============+===============+========+
|
||||
| member | member | member | member |
|
||||
+===============+===============+===============+========+
|
||||
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
less efficient than compressing the whole tar archive, but it has the
|
||||
following advantages:
|
||||
|
@ -73,21 +54,32 @@ following advantages:
|
|||
* The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel with plzip, multiplying the decompression speed.
|
||||
|
||||
* New members can be appended to the archive (by removing the eof
|
||||
* New members can be appended to the archive (by removing the EOF
|
||||
member) just like to an uncompressed tar archive.
|
||||
|
||||
* It is a safe posix-style backup format. In case of corruption,
|
||||
tarlz can extract all the undamaged members from the tar.lz
|
||||
archive, skipping over the damaged members, just like the standard
|
||||
(uncompressed) tar. Moreover, lziprecover can be used to recover at
|
||||
least part of the contents of the damaged members.
|
||||
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||
used to recover as much data as possible from each damaged member,
|
||||
and lziprecover can be used to recover some of the damaged members.
|
||||
|
||||
* A multimember tar.lz archive is usually smaller than the
|
||||
corresponding solidly compressed tar.gz archive, except when
|
||||
individually compressing files smaller than about 32 KiB.
|
||||
|
||||
Tarlz protects the extended records with a CRC in a way compatible
|
||||
with standard tar tools. *Note crc32::.
|
||||
|
||||
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
|
||||
'star' or 'v7'.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real
|
||||
tar programs to evaluate the format and perhaps implement it in their
|
||||
tools.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Invoking tarlz, Next: Examples, Prev: Introduction, Up: Top
|
||||
File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
|
||||
|
||||
2 Invoking tarlz
|
||||
****************
|
||||
|
@ -97,9 +89,15 @@ The format for running tarlz is:
|
|||
tarlz [OPTIONS] [FILES]
|
||||
|
||||
On archive creation or appending, tarlz removes leading and trailing
|
||||
slashes from file names, as well as file name prefixes containing a
|
||||
'..' component. On extraction, archive members containing a '..'
|
||||
component are skipped.
|
||||
slashes from filenames, as well as filename prefixes containing a '..'
|
||||
component. On extraction, archive members containing a '..' component
|
||||
are skipped. Tarlz detects when the archive being created or enlarged
|
||||
is among the files to be dumped, appended or concatenated, and skips it.
|
||||
|
||||
On extraction and listing, tarlz removes leading './' strings from
|
||||
member names in the archive or given in the command line, so that
|
||||
'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
|
||||
archive 'foo'.
|
||||
|
||||
tarlz supports the following options:
|
||||
|
||||
|
@ -110,10 +108,22 @@ component are skipped.
|
|||
'-V'
|
||||
'--version'
|
||||
Print the version number of tarlz on the standard output and exit.
|
||||
This version number should be included in all bug reports.
|
||||
|
||||
'-A'
|
||||
'--concatenate'
|
||||
Append tar.lz archives to the end of a tar.lz archive. All the
|
||||
archives involved must be regular (seekable) files compressed as
|
||||
multimember lzip files, and the two end-of-file blocks plus any
|
||||
zero padding must be contained in the last lzip member of each
|
||||
archive. The intermediate end-of-file blocks are removed as each
|
||||
new archive is concatenated. Exit with status 0 without modifying
|
||||
the archive if no FILES have been specified. Tarlz can't
|
||||
concatenate uncompressed tar archives.
|
||||
|
||||
'-c'
|
||||
'--create'
|
||||
Create a new archive.
|
||||
Create a new archive from FILES.
|
||||
|
||||
'-C DIR'
|
||||
'--directory=DIR'
|
||||
|
@ -137,18 +147,19 @@ component are skipped.
|
|||
|
||||
'-r'
|
||||
'--append'
|
||||
Append files to the end of an archive. The archive must be a
|
||||
Append files to the end of a tar.lz archive. The archive must be a
|
||||
regular (seekable) file compressed as a multimember lzip file, and
|
||||
the two end-of-file blocks plus any zero padding must be contained
|
||||
in the last lzip member of the archive. First this last member is
|
||||
removed, then the new members are appended, and then a new
|
||||
end-of-file member is appended to the archive. Exit with status 0
|
||||
without modifying the archive if no FILES have been specified.
|
||||
tarlz can't append files to an uncompressed tar archive.
|
||||
Tarlz can't append files to an uncompressed tar archive.
|
||||
|
||||
'-t'
|
||||
'--list'
|
||||
List the contents of an archive.
|
||||
List the contents of an archive. If FILES are given, list only the
|
||||
given FILES.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
|
@ -156,10 +167,14 @@ component are skipped.
|
|||
|
||||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive.
|
||||
Extract files from an archive. If FILES are given, extract only
|
||||
the given FILES. Else extract all the files in the archive.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression level. The default compression level is '-6'.
|
||||
Like lzip, tarlz also minimizes the dictionary size of the lzip
|
||||
members it creates, reducing the amount of memory required for
|
||||
decompression.
|
||||
|
||||
'--asolid'
|
||||
When creating or appending to a compressed archive, use appendable
|
||||
|
@ -175,22 +190,49 @@ component are skipped.
|
|||
creates a compressed appendable archive with a separate lzip
|
||||
member for each top-level directory.
|
||||
|
||||
'--no-solid'
|
||||
When creating or appending to a compressed archive, compress each
|
||||
file separately. The end-of-file blocks are compressed into a
|
||||
separate lzip member. This creates a compressed appendable archive
|
||||
with a separate lzip member for each file. This option allows
|
||||
tarlz revert to default behavior if, for example, tarlz is invoked
|
||||
through an alias like 'tar='tarlz --solid''.
|
||||
|
||||
'--solid'
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression. The files being added to the archive, along with the
|
||||
end-of-file blocks, are compressed into a single lzip member. The
|
||||
resulting archive is not appendable. No more files can be later
|
||||
appended to the archive without decompressing it first.
|
||||
appended to the archive.
|
||||
|
||||
'--anonymous'
|
||||
Equivalent to '--owner=root --group=root'.
|
||||
|
||||
'--owner=OWNER'
|
||||
When creating or appending, use OWNER for files added to the
|
||||
archive. If OWNER is not a valid user name, it is decoded as a
|
||||
decimal numeric user ID.
|
||||
|
||||
'--group=GROUP'
|
||||
When creating or appending, use GROUP for files added to the
|
||||
archive. If GROUP is not a valid group name, it is decoded as a
|
||||
decimal numeric group ID.
|
||||
|
||||
'--owner=OWNER'
|
||||
When creating or appending, use OWNER for files added to the
|
||||
archive. If OWNER is not a valid user name, it is decoded as a
|
||||
decimal numeric user ID.
|
||||
'--keep-damaged'
|
||||
Don't delete partially extracted files. If a decompression error
|
||||
happens while extracting a file, keep the partial data extracted.
|
||||
Use this option to recover as much data as possible from each
|
||||
damaged member.
|
||||
|
||||
'--missing-crc'
|
||||
Exit with error status 2 if the CRC of the extended records is
|
||||
missing. When this option is used, tarlz detects any corruption
|
||||
in the extended records (only limited by CRC collisions). But note
|
||||
that a corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is
|
||||
reported as a missing CRC instead of as a corrupt record. This
|
||||
misleading 'Missing CRC' message is the consequence of a flaw in
|
||||
the posix pax format; i.e., the lack of a mandatory check sequence
|
||||
in the extended records. *Note crc32::.
|
||||
|
||||
'--uncompressed'
|
||||
With '--create', don't compress the created tar archive. Create an
|
||||
|
@ -203,9 +245,337 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
|||
caused tarlz to panic.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Invoking tarlz, Up: Top
|
||||
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Invoking tarlz, Up: Top
|
||||
|
||||
3 A small tutorial with examples
|
||||
3 File format
|
||||
*************
|
||||
|
||||
In the diagram below, a box like this:
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
||||
represents a variable number of bytes or a fixed but large number of
|
||||
bytes (for example 512).
|
||||
|
||||
|
||||
A tar.lz file consists of a series of lzip members (compressed data
|
||||
sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them.
|
||||
|
||||
Each lzip member contains one or more tar members in a simplified
|
||||
posix pax interchange format; the only pax typeflag value supported by
|
||||
tarlz (in addition to the typeflag values defined by the ustar format)
|
||||
is 'x'. The pax format is an extension on top of the ustar format that
|
||||
removes the size limitations of the ustar format.
|
||||
|
||||
Each tar member contains one file archived, and is represented by the
|
||||
following sequence:
|
||||
|
||||
* An optional extended header block with extended header records.
|
||||
This header block is of the form described in pax header block,
|
||||
with a typeflag value of 'x'. The extended header records are
|
||||
included as the data for this header block.
|
||||
|
||||
* A header block in ustar format that describes the file. Any fields
|
||||
defined in the preceding optional extended header records override
|
||||
the associated fields in this header block for this file.
|
||||
|
||||
* Zero or more blocks that contain the contents of the file.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled
|
||||
with binary zeros, interpreted as an end-of-archive indicator. These EOF
|
||||
blocks are either compressed in a separate lzip member or compressed
|
||||
along with the tar members contained in the last lzip member.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
each lzip member in the resulting multimember tar.lz archive: *Note
|
||||
File format: (lzip)File format.
|
||||
|
||||
tar
|
||||
+========+======+=================+===============+========+======+========+
|
||||
| header | data | extended header | extended data | header | data | EOF |
|
||||
+========+======+=================+===============+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+=================================================+========+
|
||||
| member | member | member |
|
||||
+===============+=================================================+========+
|
||||
|
||||
|
||||
3.1 Pax header block
|
||||
====================
|
||||
|
||||
The pax header block is identical to the ustar header block described
|
||||
below except that the typeflag has the value 'x' (extended). The size
|
||||
field is the size of the extended header data in bytes. Most other
|
||||
fields in the pax header block are zeroed on archive creation to
|
||||
prevent trouble if the archive is read by an ustar tool, and are
|
||||
ignored by tarlz on archive extraction. *Note flawed-compat::.
|
||||
|
||||
The pax extended header data consists of one or more records, each of
|
||||
them constructed as follows:
|
||||
'"%d %s=%s\n", <length>, <keyword>, <value>'
|
||||
|
||||
The <length>, <blank>, <keyword>, <equals-sign>, and <newline> in the
|
||||
record must be limited to the portable character set. The <length> field
|
||||
contains the decimal length of the record in bytes, including the
|
||||
trailing <newline>. The <value> field is stored as-is, without
|
||||
conversion to UTF-8 nor any other transformation.
|
||||
|
||||
These are the <keyword> fields currently supported by tarlz:
|
||||
|
||||
'linkpath'
|
||||
The pathname of a link being created to another file, of any type,
|
||||
previously archived. This record overrides the linkname field in
|
||||
the following ustar header block. The following ustar header block
|
||||
determines the type of link created. If typeflag of the following
|
||||
header block is 1, it will be a hard link. If typeflag is 2, it
|
||||
will be a symbolic link and the linkpath value will be used as the
|
||||
contents of the symbolic link.
|
||||
|
||||
'path'
|
||||
The pathname of the following file. This record overrides the name
|
||||
and prefix fields in the following ustar header block.
|
||||
|
||||
'size'
|
||||
The size of the file in bytes, expressed as a decimal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard. This record
|
||||
overrides the size field in the following ustar header block. The
|
||||
size record is used only for files with a size value greater than
|
||||
8_589_934_591 (octal 77777777777). This is 2^33 bytes or larger.
|
||||
|
||||
'GNU.crc32'
|
||||
CRC32-C (Castagnoli) of the extended header data excluding the 8
|
||||
bytes representing the CRC <value> itself. The <value> is
|
||||
represented as 8 hexadecimal digits in big endian order,
|
||||
'22 GNU.crc32=00000000\n'. The keyword of the CRC record is
|
||||
protected by the CRC to guarante that corruption is always detected
|
||||
(except in case of CRC collision). A CRC was chosen because a
|
||||
checksum is too weak for a potentially large list of variable
|
||||
sized records. A checksum can't detect simple errors like the
|
||||
swapping of two bytes.
|
||||
|
||||
|
||||
3.2 Ustar header block
|
||||
======================
|
||||
|
||||
The ustar header block has a length of 512 bytes and is structured as
|
||||
shown in the following table. All lengths and offsets are in decimal.
|
||||
|
||||
Field Name Offset Length (in bytes)
|
||||
name 0 100
|
||||
mode 100 8
|
||||
uid 108 8
|
||||
gid 116 8
|
||||
size 124 12
|
||||
mtime 136 12
|
||||
chksum 148 8
|
||||
typeflag 156 1
|
||||
linkname 157 100
|
||||
magic 257 6
|
||||
version 263 2
|
||||
uname 265 32
|
||||
gname 297 32
|
||||
devmajor 329 8
|
||||
devminor 337 8
|
||||
prefix 345 155
|
||||
|
||||
All characters in the header block are coded using the ISO/IEC
|
||||
646:1991 (ASCII) standard, except in fields storing names for files,
|
||||
users, and groups. For maximum portability between implementations,
|
||||
names should only contain characters from the portable filename
|
||||
character set. But if an implementation supports the use of characters
|
||||
outside of '/' and the portable filename character set in names for
|
||||
files, users, and groups, tarlz will use the byte values in these names
|
||||
unmodified.
|
||||
|
||||
The fields name, linkname, and prefix are null-terminated character
|
||||
strings except when all characters in the array contain non-null
|
||||
characters including the last character.
|
||||
|
||||
The name and the prefix fields produce the pathname of the file. A
|
||||
new pathname is formed, if prefix is not an empty string (its first
|
||||
character is not null), by concatenating prefix (up to the first null
|
||||
character), a <slash> character, and name; otherwise, name is used
|
||||
alone. In either case, name is terminated at the first null character.
|
||||
If prefix begins with a null character, it is ignored. In this manner,
|
||||
pathnames of at most 256 characters can be supported. If a pathname does
|
||||
not fit in the space provided, an extended record is used to store the
|
||||
pathname.
|
||||
|
||||
The linkname field does not use the prefix to produce a pathname. If
|
||||
the linkname does not fit in the 100 characters provided, an extended
|
||||
record is used to store the linkname.
|
||||
|
||||
The mode field provides 12 access permission bits. The following
|
||||
table shows the symbolic name of each bit and its octal value:
|
||||
|
||||
Bit Name Bit value
|
||||
S_ISUID 04000
|
||||
S_ISGID 02000
|
||||
S_ISVTX 01000
|
||||
S_IRUSR 00400
|
||||
S_IWUSR 00200
|
||||
S_IXUSR 00100
|
||||
S_IRGRP 00040
|
||||
S_IWGRP 00020
|
||||
S_IXGRP 00010
|
||||
S_IROTH 00004
|
||||
S_IWOTH 00002
|
||||
S_IXOTH 00001
|
||||
|
||||
The uid and gid fields are the user and group ID of the owner and
|
||||
group of the file, respectively.
|
||||
|
||||
The size field contains the octal representation of the size of the
|
||||
file in bytes. If the typeflag field specifies a file of type '0'
|
||||
(regular file) or '7' (high performance regular file), the number of
|
||||
logical records following the header is (size / 512) rounded to the next
|
||||
integer. For all other values of typeflag, tarlz either sets the size
|
||||
field to 0 or ignores it, and does not store or expect any logical
|
||||
records following the header. If the file size is larger than
|
||||
8_589_934_591 bytes (octal 77777777777), an extended record is used to
|
||||
store the file size.
|
||||
|
||||
The mtime field contains the octal representation of the modification
|
||||
time of the file at the time it was archived, obtained from the stat()
|
||||
function.
|
||||
|
||||
The chksum field contains the octal representation of the value of
|
||||
the simple sum of all bytes in the header logical record. Each byte in
|
||||
the header is treated as an unsigned value. When calculating the
|
||||
checksum, the chksum field is treated as if it were all <space>
|
||||
characters.
|
||||
|
||||
The typeflag field contains a single character specifying the type of
|
||||
file archived:
|
||||
|
||||
''0''
|
||||
Regular file.
|
||||
|
||||
''1''
|
||||
Hard link to another file, of any type, previously archived.
|
||||
|
||||
''2''
|
||||
Symbolic link.
|
||||
|
||||
''3', '4''
|
||||
Character special file and block special file respectively. In
|
||||
this case the devmajor and devminor fields contain information
|
||||
defining the device in unspecified format.
|
||||
|
||||
''5''
|
||||
Directory.
|
||||
|
||||
''6''
|
||||
FIFO special file.
|
||||
|
||||
''7''
|
||||
Reserved to represent a file to which an implementation has
|
||||
associated some high-performance attribute. Tarlz treats this type
|
||||
of file as a regular file (type 0).
|
||||
|
||||
|
||||
The magic field contains the ASCII null-terminated string "ustar".
|
||||
The version field contains the characters "00" (0x30,0x30). The fields
|
||||
uname, and gname are null-terminated character strings. Each numeric
|
||||
field contains a leading zero-filled, null-terminated octal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Amendments to pax format, Next: Examples, Prev: File format, Up: Top
|
||||
|
||||
4 The reasons for the differences with pax
|
||||
******************************************
|
||||
|
||||
Tarlz is meant to reliably detect invalid or corrupt metadata during
|
||||
extraction and to not create safety risks in the archives it creates. In
|
||||
order to achieve these goals, tarlz makes some changes to the variant
|
||||
of the pax format that it uses. This chapter describes these changes
|
||||
and the concrete reasons to implement them.
|
||||
|
||||
|
||||
4.1 Add a CRC of the extended records
|
||||
=====================================
|
||||
|
||||
The posix pax format has a serious flaw. The metadata stored in pax
|
||||
extended records are not protected by any kind of check sequence.
|
||||
Corruption in a long filename may cause the extraction of the file in
|
||||
the wrong place without warning. Corruption in a long file size may
|
||||
cause the truncation of the file or the appending of garbage to the
|
||||
file, both followed by a spurious warning about a corrupt header far
|
||||
from the place of the undetected corruption.
|
||||
|
||||
Metadata like filename and file size must be always protected in an
|
||||
archive format because of the adverse effects of undetected corruption
|
||||
in them, potentially much worse that undetected corruption in the data.
|
||||
Even more so in the case of pax because the amount of metadata it
|
||||
stores is potentially large, making undetected corruption more probable.
|
||||
|
||||
Because of the above, tarlz protects the extended records with a CRC
|
||||
in a way compatible with standard tar tools. *Note key_crc32::.
|
||||
|
||||
|
||||
4.2 Remove flawed backward compatibility
|
||||
========================================
|
||||
|
||||
In order to allow the extraction of pax archives by a tar utility
|
||||
conforming to the POSIX-2:1993 standard, POSIX.1-2008 recommends
|
||||
selecting extended header field values that allow such tar to create a
|
||||
regular file containing the extended header records as data. This
|
||||
approach is broken because if the extended header is needed because of
|
||||
a long filename, the name and prefix fields will be unable to contain
|
||||
the full pathname of the file. Therefore the files corresponding to
|
||||
both the extended header and the overridden ustar header will be
|
||||
extracted using truncated filenames, perhaps overwriting existing files
|
||||
or directories. It may be a security risk to extract a file with a
|
||||
truncated filename.
|
||||
|
||||
To avoid this problem, tarlz writes extended headers with all fields
|
||||
zeroed except size, chksum, typeflag, magic and version. This prevents
|
||||
old tar programs from extracting the extended records as a file in the
|
||||
wrong place. Tarlz also sets to zero those fields of the ustar header
|
||||
overridden by extended records.
|
||||
|
||||
If the extended header is needed because of a file size larger than
|
||||
8 GiB, the size field will be unable to contain the full size of the
|
||||
file. Therefore the file may be partially extracted, and the tool will
|
||||
issue a spurious warning about a corrupt header at the point where it
|
||||
thinks the file ends. Setting to zero the overridden size in the ustar
|
||||
header at least prevents the partial extraction and makes obvious that
|
||||
the file has been truncated.
|
||||
|
||||
|
||||
4.3 As simple as possible (but not simpler)
|
||||
===========================================
|
||||
|
||||
The tarlz format is mainly ustar. Extended pax headers are used only
|
||||
when needed because the length of a filename or link name, or the size
|
||||
of a file exceed the limits of the ustar format. Adding extended
|
||||
headers to each member just to record subsecond timestamps seems
|
||||
wasteful for a backup format.
|
||||
|
||||
|
||||
4.4 Avoid misconversions to/from UTF-8
|
||||
======================================
|
||||
|
||||
There is no portable way to tell what charset a text string is coded
|
||||
into. Therefore, tarlz stores all fields representing text strings
|
||||
as-is, without conversion to UTF-8 nor any other transformation. This
|
||||
prevents accidental double UTF-8 conversions. If the need arises this
|
||||
behavior will be adjusted with a command line option in the future.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
5 A small tutorial with examples
|
||||
********************************
|
||||
|
||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||
|
@ -232,7 +602,7 @@ Example 4: Create a compressed appendable archive containing directories
|
|||
'dir1', 'dir2' and 'dir3' with a separate lzip member per directory.
|
||||
Then append files 'a', 'b', 'c', 'd' and 'e' to the archive, all of
|
||||
them contained in a single lzip member. The resulting archive
|
||||
'archive.tar.lz' contains 5 lzip members (including the eof member).
|
||||
'archive.tar.lz' contains 5 lzip members (including the EOF member).
|
||||
|
||||
tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
|
||||
tarlz --asolid -rf archive.tar.lz a b c d e
|
||||
|
@ -240,7 +610,7 @@ them contained in a single lzip member. The resulting archive
|
|||
|
||||
Example 5: Create a solidly compressed archive 'archive.tar.lz'
|
||||
containing files 'a', 'b' and 'c'. Note that no more files can be later
|
||||
appended to the archive without decompressing it first.
|
||||
appended to the archive.
|
||||
|
||||
tarlz --solid -cf archive.tar.lz a b c
|
||||
|
||||
|
@ -263,7 +633,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
|
|||
|
||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
4 Reporting bugs
|
||||
6 Reporting bugs
|
||||
****************
|
||||
|
||||
There are probably bugs in tarlz. There are certainly errors and
|
||||
|
@ -284,8 +654,11 @@ Concept index
|
|||
|